寒武紀論文對比記錄

重點論文:

   寒武紀從2014年開始:
(1)DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
(2)DaDianNao: A Machine-Learning Supercomputer
(3)PuDianNao: A Polyvalent Machine Learning Accelerator
(4)ShiDianNao: Shifting Vision Processing Closer to the Sensor
(5)Cambricon-X: An Accelerator for Sparse Neural Networks

論文側重點:

(1)DianNao:可以看作是硬件設計的基礎

(2)DaDianNao:面向服務器端的高性能計算架構

(3)  ShiDianNao:面向邊緣端設備應用場景的

   (4)  PuDianNao:面向更加泛化的機器學習算法的

(5)combricon:面向更加廣泛的機器學習加速器的指令集架構。

寒武紀的DianNao系列芯片構架也採用了流式處理的乘加樹(DianNao[2]、DaDianNao[3]、PuDianNao[4])和類脈動陣列的結構(ShiDianNao[5])。爲了兼容小規模的矩陣運算並保持較高的利用率,同時更好的支持併發的多任務,DaDianNao和PuDianNao降低了計算粒度,採用了雙層細分的運算架構,即在頂層的PE陣列中,每個PE由更小規模的多個運算單元構成,更細緻的任務分配和調度雖然佔用了額外的邏輯,但有利於保證每個運算單元的計算效率並控制功耗,

1. DianNao

    參考:https://blog.csdn.net/evolone/article/details/80765094

2. DaDianNao

 參考:https://blog.csdn.net/u013108511/article/details/88831132

3. ShiDianNao

參考: https://www.dazhuanlan.com/2019/12/18/5df9db0b0812e/

https://blog.csdn.net/weixin_33810006/article/details/87977439

          

 

[2] Chen Y, Chen Y, Chen Y, et al.DianNao: a small-footprint high-throughput accelerator for ubiquitousmachine-learning[C]// International Conference on Architectural Support forProgramming Languages and Operating Systems. ACM, 2014:269-284.
[3] Luo T, Luo T, Liu S, et al.DaDianNao: A Machine-Learning Supercomputer[C]// Ieee/acm InternationalSymposium on Microarchitecture. IEEE, 2015:609-622.
[4] Liu D, Chen T, Liu S, et al.PuDianNao: A Polyvalent Machine Learning Accelerator[C]// TwentiethInternational Conference on Architectural Support for Programming Languages andOperating Systems. ACM, 2015:369-381.
[5] Du Z, Fasthuber R, Chen T, et al.ShiDianNao: shifting vision processing closer to the sensor[C]// ACM/IEEE,International Symposium on Computer Architecture. IEEE, 2015:92-104.
[6] Eric Chung, Jeremy Fowers, KalinOvtcharov, et al. Accelerating Persistent Neural Networks at Datacenter Scale.Hot Chips 2017.
[7] Meng W, Gu Z, Zhang M, et al.Two-bit networks for deep learning on resource-constrained embedded devices[J].arXiv preprint arXiv:1701.00485, 2017.
[8] Hubara I, Courbariaux M, SoudryD, et al. Binarized neural networks[C]//Advances in neural informationprocessing systems. 2016: 4107-4115.
[9] Qiu J, Wang J, Yao S, et al.Going deeper with embedded fpga platform for convolutional neuralnetwork[C]//Proceedings of the 2016 ACM/SIGDA International Symposium onField-Programmable Gate Arrays. ACM, 2016: 26-35.
[10] Xilinx, Deep Learningwith INT8Optimizationon Xilinx Devices, www.xilinx.com/support/doc…
[11] Han S, Kang J, Mao H, et al.Ese: Efficient speech recognition engine with compressed lstm on fpga[J]. arXivpreprint arXiv:1612.00694, 2016.
[12] Zhang S, Du Z, Zhang L, et al. Cambricon-X: An accelerator for sparseneural networks[C]// Ieee/acm International Symposium on Microarchitecture.IEEE Computer Society, 2016:1-12.
[13] Shafiee A, Nag A, MuralimanoharN, et al. ISAAC: A convolutional neural network accelerator with in-situ analogarithmetic in crossbars[C]//Proceedings of the 43rd International Symposium onComputer Architecture. IEEE Press, 2016: 14-26.

發佈了24 篇原創文章 · 獲贊 8 · 訪問量 3萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章