The 24th IASTED International Conference on
Parallel and Distributed Computing and Systems
PDCS 2012

November 12 – 14, 2012
Las Vegas, USA

TUTORIAL SESSION

Vector Acceleration of High-Performance Embedded Applications on Multicore Processors

Prof. Sotirios G. Ziavras
New Jersey Institute of Technology, USA
ziavras@njit.edu

Duration

3 hours

Abstract

fiogf49gjkf0d
This tutorial will be of interest to computer engineers and users of high-performance embedded computing systems. It will focus on high-performance systems that can operate efficiently on large arrays or streams of incoming data. The emphasis will be on SoC (System-on-a-Chip) multicore processors that contain hardware accelerators. Such accelerators commonly center on Vector Processor (VP) or SIMD (Single-Instruction stream, Multiple-Data stream) processing architectures that take advantage of the DLP (Data-Level Parallelism) present in high-performance applications. Vector-processing accelerators were first introduced for supercomputers several decades ago. They are becoming ubiquitous in SoC solutions for embedded applications. The tutorial will first discuss the need for the presence of these accelerators in high-performance embedded multicore designs and will then present relevant systems. This effort is justified since accelerators are characterized not only by very high performance but also by low energy consumption. The tutorial will finally present an in-depth case study of an innovative shared vector coprocessor for multicore environments. Its FPGA (Field-Programmable Gate Array) and ASIC (Application-Specific Integrated Circuit) based realizations will be analyzed and compared in terms of performance and energy consumption. This detailed comparison will have great educational value for graduate students as well as many practitioners. Finally, future directions in the design and deployment of such accelerators will be discussed to motivate researchers in pursuing relevant avenues of research and development.

Objectives

fiogf49gjkf0d
Heterogeneous multicores comprising hardware accelerators are the most appropriate platforms for speeding up high-performance embedded applications while also maintaining acceptable energy consumption. The tutorial will address frameworks required for the acceleration of these applications on multicore processors comprising hardware accelerators for vector processing. Techniques that can yield high performance while operating at acceptable power consumption levels will be the core of the tutorial. Comparisons will be made as well with GPUs (Graphics Processing Units) in relation to performance and energy consumption. FPGA (Field-Programmable Gate Array) and ASIC (Application-Specific Integrated Circuit) realizations of vector acceleration platforms will be presented.
To learn about recent advances in high-performance embedded computing that employ heterogeneous multicore processors comprising hardware accelerators of the vector processing type. In the mobile computing market, advanced vector processing is required primarily for data streaming applications. Relevant accelerators have been common in the high-performance computing field for many decades. However, only recently has the need to minimize their power consumption become a high priority because of emerging embedded applications. Therefore, designers of such systems have to become proficient in relevant performance-energy tradeoff issues in order to apply them to real-world problems.

Timeline

fiogf49gjkf0d
1. INTRODUCTION (30 minutes):
* Computer Architecture Fundamentals
* Recent Advances in Multicore Processors
* Heterogeneous Multicores and Coprocessors as Accelerators
* High-Performance Embedded Computing with Multicores
2. MODELING PERFORMANCE AND ENERGY CONSUMPTION (30 minutes)
3. CONTROLLING ENERGY CONSUMPTION (15 minutes):
* Static Power
* Dynamic Power
* Power Gating
4. VECTOR COPROCESSORS, GPUS AND SIMD EXTENSIONS FOR GENERAL-PURPOSE PROCESSORS (30 minutes) :
* SIMD (Single-Instruction Multiple-Data) Extensions for General-Purpose Processors
* Lane-Based Design of Vector Coprocessors
* GPUs
* Comparisons
5. APPLICATION BENCHMARKING (10 minutes)
6. PERFORMANCE-ENERGY TRADEOFFS WITH VECTOR COPROCESSORS (20 minutes):
* Managing Static Power Consumption
* Managing Dynamic Power Consumption with Power Gating
7. FPGA IMPEMENTATION OF A SHARED VECTOR COPROCESSOR FOR MULTICORES (15 minutes):
* Objectives
* Problems to Solve
* Results
8. ITS ASIC IMPLEMENTATION (20 minutes):
* Objectives
* Problems to Solve
* Results
* Comparison with the FPGA Implementation
9. FUTURE OBJECTIVES (10 minutes)

Background Knowledge Expected of the Participants

fiogf49gjkf0d
Computer architecture and embedded applications.

Qualifications of the Instructor(s)

Tutorial Session Portrait

fiogf49gjkf0d
Dr. Sotirios G. Ziavras is a Full Professor in the Department of Electrical and Computer Engineering (ECE) at the New Jersey Institute of Technology (NJIT), and also serves as the Director of its CAPPL (Computer Architecture and Parallel Processing Laboratory) Laboratory. He received the Diploma in Electrical Engineering from the National Technical University of Athens (1984) and the D.Sc. in Computer Science from George Washington University (1990). He was with the Center for Automation Research at the University of Maryland, College Park from 1988 to 1989, focusing on supercomputing techniques for computer vision and numerical analysis. He was a visiting Assistant Professor in the ECE Department at George Mason University in Spring 1990. He joined NJIT in Fall 1990 as an Assistant Professor. He has served as the Associate Chair for Graduate Studies in ECE for four years. He was recipient in 2011 of NJIT’s Excellence in Teaching Award for Graduate Education.
He received a National Science Foundation (NSF) Research Initiation Award in 1991, as well as many other grants from the NSF, Department of Energy, AT&T, etc. He has published about 170 research papers in journals and conferences. His main research interests are advanced computer architecture, high-performance computing (architectures and algorithms), chip multiprocessors and embedded computing systems.