Altera筆記：Introduction to Parallel Computing with OpenCL

原創

2020-02-25 10:58

今天註冊了altera的training課程，想看看OpenCL寫FPGA是怎麼寫（只知道用HDL寫FPGA）。

heterogeneous computing 需要在CPU DSP上面寫sequential code，同時在FPGA上面用VHDL/Verilog來實現fine grained parallelism and vectoring。不過這樣在寫的時候或者在運行的時候會有很多調試等等的工作，導致開發效率低下。

以往來說，parallelism是側重於寫ILP給OoO processor。但是現在隨着core 複雜化，這樣不行。於是改成tlp，也就是要顯式的寫成thread level parallelism來利用多核heterogenious 環境。所以一個通用的並行編程的middle layer（ren：中間層）就可以實現在多種不同架構的硬件上編程，而不需要分爲C++/Java和HDL。

寫並行程序有2大難點：

1. 把順序算法設計成並行以利用多核的heterogeneous硬件；

2. 解決data sharing和Synchronization issues。

在並行運行的時候，data dependency就是很大問題。例如，在5級pipeline(就是一種task parallelism where pipes have a producer-consumer relation)的MIPS中就會有RAW和superscaler時會有WAW和WAR。這時候會有牽涉到硬件設計的：Uniform address spaces, cache coherency(這個面試常常會考，2個protocal要理解)。

不過OpenCL提供了abstract model for parallelism, 以及data sharing跟Synchronization的機制。

2個並行編程的方法：scatter and gathering （data parallelism）和 divide and conquer（task parallelism）。一般都會把2個混着用。

scatter and gather: 可以用SIMD。

Divide and conquer: 可以用simultaneous multithreading （SMT）。“A modern GPU contains a set of multi-threaded streaming multiprocessors (SM), which are discrete independent execution units.” 點擊打開鏈接這裏有個詳細的分析：SIMD < SIMT < SMT: parallelism in NVIDIA GPUs 點擊打開鏈接

......................................

接下來可以上：

We recommend completing the following courses:

不過還是得有板子才行

那有了C code就不用設計成Verilog就能port到FPGA上面？真厲害。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Altera筆記：Introduction to Parallel Computing with OpenCL

2024年DataOps趨勢預測：AI不會取代數據工程師

雲原生週刊：K8s 中的服務和網絡｜ 2024.4.29

[轉帖]cpupower

今天，昨天，近七天，近30天，近90天，js封裝

華爲云云原生FinOps解決方案，釋放雲原生最大價值

OpenMIPS VHDL study學習筆記第一天 -- Feb 25

OpenRISC和Orpsoc在DE2-115上的的bootup實驗

Ch8.3: find all the subsets of a set

從Nachos轉向ucore/xv6

Verilog下的image processing---第一話---讀取旋轉並寫入

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結