reading note

原創

jinxingyingtu

2020-06-03 16:29

discuss scope

workload partition
energy efficiency
computing aprooches
1. at runtime
2. algotirhm
3. progtamming
4. compiler
5. application level
discrete and fused CPU-GPU systems
benchmark

Motivation

CPU: out-of-order, multi-instruction issue cores，run at high frequency and use large-sized caches to minimize the latency of a single thread. suited for latency-critical applications
GPU: in-order cores that share their control unit,GPU cores use lower frequency, and smaller-sized caches. suited for throughput-critical applications
a heterogeneous system: can provide high performance for a much wider variety of applications
and usage scenarios than using either a CPU or a GPU alone

In systems with GPUs, CPUs have been conventionally used as host for GPU to manage I/O and scheduling; however,as continuing innovations improve CPU performance even further ,using their computation capabilities also has become more attractive

Challenges

The vastly different architecture, programming model, and performance (for a given
program) of CPUs and GPUs present unique challenges in heterogeneous computing.

PU specific
Application/Problem specific
Objective Specific

Workload Partition

dynamic or static scheduling

the mapping of subtasks to PUs is fixed or not

basis of workload partition

why a particular scheduling of tasks to PUs is done. eg. by characteristic/capability of the PU itself and/or the subtasks themselves.

scheduling based on relative performance of PUs

using a performance model, it evaluates the respective contributions of each PU and then makes an estimation of the total execution time of the FFT problem for arbitrary work distribution problem sizes. decomposes the computation and uses profiling to estimates the optimal workload division between PUs. profiling and estimating
accelerating query processing by the length of the query. specific peocess
divide workload based on several factors such as the contention of devices, historical performance data, number of cores, processor speed, problem size, and device status
Their technique intercepts functioncalls to kernels and schedules them on a PU based on their argument size, historical profile, and location of data. Their technique accounts for both computation time and data transfer time
accelerating QR factorization: sequence of subtasks -> CPU or GPU functions -> statics or dynamic schedule
divided based on their relative performance. The estimates of performance of PUs is updated during each iteration of execution of algorithm.
hard real-time stream scheduling in heterogeneous systems: partitions the incoming streams
into two subsets for CPU and GPU. The algorithm works to find an assignment that satisfies both the deadline constraint of each stream alone and the aggregate throughput requirements of all the streams
static partitioning technique for OpenCL programs on HCSs： conducts static analysis on OpenCL programs to extract code features. -> determines the best work-division ratio -> divides the workload into suitable sized chunks for each PU using machine learning approach

usually optimized for particular processes.

Scheduling Based on Nature of Subtasks

pipelining

mapreduce

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

reading note

discuss scope

Motivation

Challenges

Workload Partition

dynamic or static scheduling

basis of workload partition

scheduling based on relative performance of PUs

Scheduling Based on Nature of Subtasks

pipelining

mapreduce

reading note

Linux內核結構(LXR linux)

LFS安裝筆記

voreen note

<A Survey of CPU-GPU Heterogeneous Computing Techniques >reading note

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結