reading note

discuss scope

  1. workload partition
  2. energy efficiency
  3. computing aprooches
    1. at runtime
    2. algotirhm
    3. progtamming
    4. compiler
    5. application level
  4. discrete and fused CPU-GPU systems
  5. benchmark

Motivation

  • CPU: out-of-order, multi-instruction issue cores,run at high frequency and use large-sized caches to minimize the latency of a single thread. suited for latency-critical applications

  • GPU: in-order cores that share their control unit,GPU cores use lower frequency, and smaller-sized caches. suited for throughput-critical applications

  • a heterogeneous system: can provide high performance for a much wider variety of applications
    and usage scenarios than using either a CPU or a GPU alone

In systems with GPUs, CPUs have been conventionally used as host for GPU to manage I/O and scheduling; however,as continuing innovations improve CPU performance even further ,using their computation capabilities also has become more attractive

Challenges

The vastly different architecture, programming model, and performance (for a given
program) of CPUs and GPUs present unique challenges in heterogeneous computing.

  • PU specific
  • Application/Problem specific
  • Objective Specific

Workload Partition

dynamic or static scheduling

the mapping of subtasks to PUs is fixed or not

basis of workload partition

why a particular scheduling of tasks to PUs is done. eg. by characteristic/capability of the PU itself and/or the subtasks themselves.

scheduling based on relative performance of PUs

  • using a performance model, it evaluates the respective contributions of each PU and then makes an estimation of the total execution time of the FFT problem for arbitrary work distribution problem sizes. decomposes the computation and uses profiling to estimates the optimal workload division between PUs. profiling and estimating

  • accelerating query processing by the length of the query. specific peocess

  • divide workload based on several factors such as the contention of devices, historical performance data, number of cores, processor speed, problem size, and device status

  • Their technique intercepts functioncalls to kernels and schedules them on a PU based on their argument size, historical profile, and location of data. Their technique accounts for both computation time and data transfer time

  • accelerating QR factorization: sequence of subtasks -> CPU or GPU functions -> statics or dynamic schedule

  • divided based on their relative performance. The estimates of performance of PUs is updated during each iteration of execution of algorithm.

  • hard real-time stream scheduling in heterogeneous systems: partitions the incoming streams
    into two subsets for CPU and GPU. The algorithm works to find an assignment that satisfies both the deadline constraint of each stream alone and the aggregate throughput requirements of all the streams

  • static partitioning technique for OpenCL programs on HCSs: conducts static analysis on OpenCL programs to extract code features. -> determines the best work-division ratio -> divides the workload into suitable sized chunks for each PU using machine learning approach

usually optimized for particular processes.

Scheduling Based on Nature of Subtasks

pipelining

mapreduce

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章