The 11th IASTED International Conference on
Parallel and Distributed Computing and Networks
PDCN 2013

February 11 – 13, 2013
Innsbruck, Austria

TUTORIAL SESSION

Heterogeneous Computing with OpenCL

Dr. Dongping Zhang
Advanced Micro Devices, USA
dongping.zhang@amd.com

Duration

3 hours

Abstract

fiogf49gjkf0d
OpenCL is a standard for programming heterogeneous computers built from CPUs, GPUs, and other processors. It includes a framework to define the platform in terms of a host (e.g. a CPU) and one or more compute devices (e.g. a GPU) plus a C-based programming language for writing programs for the compute devices. Using OpenCL, a programmer can write task-based and data-parallel programs that use all the resources of the heterogeneous computer.
In this tutorial, we will explore OpenCL in depth. The first half of the tutorial is an introduction to OpenCL memory, execution and platform models, and AMD discrete GPU and APU architecture features that enable the developers to write efficient OpenCL applications. The second half of the tutorial focuses on a series of case studies, including cluster-based and GPGPU applications in image analysis, computer vision and machine learning domains. One detailed example is given to show how a multicore applications ported into heterogeneous platform, with the implementation details and information on how to use static code analysis and run-time profiling techniques to optimize the application.
Originally proposed by AMD, the Heterogeneous System Architecture (HSA) is an industry wide foundation that in 2012 drafted an abstract machine specification, including an execution model and memory model capable of supporting a wide range of programming models and languages ranging from C++11 to Java and beyond. Also delivered was the HSA system architecture description that describes how a heterogeneous platform based on HSA is constructed and configured. More will be published later. The specifications are careful in their design to allow many different possible implementations. So in the last session of this tutorial, I will extend the scope from heterogeneous computing with OpenCL to HSA, its abstract machine and system architecture framework.

Timeline

• Introduction to heterogeneous computing and overview of OpenCL, 20 min
• What is the latest status of heterogeneous computing? What are the challenges?
• High level overview of OpenCL from its origin to OpenCL next
• Comparison of OpenCL, CUDA and other programming models.
• OpenCL architecture, 30 minutes
• OpenCL memory, execution and platform model.
• AMD discrete GPU and Accelerated Processing Units architecture introduction. 30 min
• Break 20min
• Case studies using OpenCL, including workloads on image search on APU clusters, OpenCL for OpenCV library, classification. (50min)
• Programmability issues of other real-world applications on heterogeneous platforms, including single node workstation and APU clusters. 10min
• A brief introduction to the Heterogeneous System Architecture and HSA foundation with a few minutes covering HSA abstraction machine, input language and memory model. 20min

Tutorial Materials

fiogf49gjkf0d
• Introduction to heterogeneous computing and an overview of OpenCL.
• OpenCL architectures (memory, execution and platform models)
• Explore the OpenCL spec through real-world applications in image processing, computer vision and machine learning domains, for example large-scale image search, OpenCL for OpenCV and classification algorithms.
• Introduction to the Heterogeneous System Architecture and to the HSA foundation.
• Overview of the HSA abstract machine, input language and memory model

Background Knowledge Expected of the Participants

fiogf49gjkf0d
C/C++, undergraduate-level computer architecture

Qualifications of the Instructor(s)

Tutorial Session Portrait

AMD June 2011 – present: Working on the Heterogeneous System Architecture and its successors, specializing in computer vision and image analysis workloads, programming models and performance optimization. Most recently, she joined AMD’s Exascale project and focuses on processing in memory research. She serves on various program committees and reviews for a number of conferences. Prior to this, she was with biomedical Image Analysis group in Imperial College London. From the same institution, she received PhD in Compute Science.

References

[1] 
fiogf49gjkf0d
D. P. Zhang. Book Chapter 13: OpenCL profiling and debugging (revision); Book Chapter 14: Performance optimization of an image analysis application on dGPU and APUs. Heterogeneous Programming with OpenCL. 2nd Edition, Morgan Kaufmann, 2012.
[2] D. P. Zhang. Biomedical data analysis on heterogeneous platform, AMD Fusion Developer Summit, 2012.
[3] D. P. Zhang et al. Vasculature segmentation using parallel multi-hypothesis template tracking on heterogeneous platform. SPIE on Parallel Processing for Imaging Applications, 2012
[4] D. P. Zhang, et al. Motion tracking of left ventricle and coronaries in 4D CTA, SPIE, 2011
[5] R. Wolz, D. P. Zhang et al. Multi-method analysis of MRI images in early diagnostics of Alzheimer’s disease. PLoS One, V6(10), 2011
[6] D. P. Zhang et al. Nonrigid registration and template matching for coronary motion modeling from 4D CTA. WBIR, 2010