重點論文：

寒武紀從2014年開始：
（1）DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
（2）DaDianNao: A Machine-Learning Supercomputer
（3）PuDianNao: A Polyvalent Machine Learning Accelerator
（4）ShiDianNao: Shifting Vision Processing Closer to the Sensor
（5）Cambricon-X: An Accelerator for Sparse Neural Networks

論文側重點：

（1）DianNao：可以看作是硬件設計的基礎

（2）DaDianNao：面向服務器端的高性能計算架構

（3) ShiDianNao：面向邊緣端設備應用場景的

(4) PuDianNao：面向更加泛化的機器學習算法的

（5）combricon：面向更加廣泛的機器學習加速器的指令集架構。

寒武紀的DianNao系列芯片構架也採用了流式處理的乘加樹（DianNao[2]、DaDianNao[3]、PuDianNao[4]）和類脈動陣列的結構（ShiDianNao[5]）。爲了兼容小規模的矩陣運算並保持較高的利用率，同時更好的支持併發的多任務，DaDianNao和PuDianNao降低了計算粒度，採用了雙層細分的運算架構，即在頂層的PE陣列中，每個PE由更小規模的多個運算單元構成，更細緻的任務分配和調度雖然佔用了額外的邏輯，但有利於保證每個運算單元的計算效率並控制功耗，

1. DianNao

參考：https://blog.csdn.net/evolone/article/details/80765094

2. DaDianNao

參考：https://blog.csdn.net/u013108511/article/details/88831132

3. ShiDianNao

參考： https://www.dazhuanlan.com/2019/12/18/5df9db0b0812e/

https://blog.csdn.net/weixin_33810006/article/details/87977439

[2] Chen Y, Chen Y, Chen Y, et al.DianNao: a small-footprint high-throughput accelerator for ubiquitousmachine-learning[C]// International Conference on Architectural Support forProgramming Languages and Operating Systems. ACM, 2014:269-284.
[3] Luo T, Luo T, Liu S, et al.DaDianNao: A Machine-Learning Supercomputer[C]// Ieee/acm InternationalSymposium on Microarchitecture. IEEE, 2015:609-622.
[4] Liu D, Chen T, Liu S, et al.PuDianNao: A Polyvalent Machine Learning Accelerator[C]// TwentiethInternational Conference on Architectural Support for Programming Languages andOperating Systems. ACM, 2015:369-381.
[5] Du Z, Fasthuber R, Chen T, et al.ShiDianNao: shifting vision processing closer to the sensor[C]// ACM/IEEE,International Symposium on Computer Architecture. IEEE, 2015:92-104.
[6] Eric Chung, Jeremy Fowers, KalinOvtcharov, et al. Accelerating Persistent Neural Networks at Datacenter Scale.Hot Chips 2017.
[7] Meng W, Gu Z, Zhang M, et al.Two-bit networks for deep learning on resource-constrained embedded devices[J].arXiv preprint arXiv:1701.00485, 2017.
[8] Hubara I, Courbariaux M, SoudryD, et al. Binarized neural networks[C]//Advances in neural informationprocessing systems. 2016: 4107-4115.
[9] Qiu J, Wang J, Yao S, et al.Going deeper with embedded fpga platform for convolutional neuralnetwork[C]//Proceedings of the 2016 ACM/SIGDA International Symposium onField-Programmable Gate Arrays. ACM, 2016: 26-35.
[10] Xilinx, Deep Learningwith INT8Optimizationon Xilinx Devices, www.xilinx.com/support/doc…
[11] Han S, Kang J, Mao H, et al.Ese: Efficient speech recognition engine with compressed lstm on fpga[J]. arXivpreprint arXiv:1612.00694, 2016.
[12] Zhang S, Du Z, Zhang L, et al. Cambricon-X: An accelerator for sparseneural networks[C]// Ieee/acm International Symposium on Microarchitecture.IEEE Computer Society, 2016:1-12.
[13] Shafiee A, Nag A, MuralimanoharN, et al. ISAAC: A convolutional neural network accelerator with in-situ analogarithmetic in crossbars[C]//Proceedings of the 43rd International Symposium onComputer Architecture. IEEE Press, 2016: 14-26.

半度微涼1993

發佈了24 篇原創文章 · 獲贊 8 · 訪問量 3萬+

私信關注

寒武紀論文對比記錄

重點論文：

論文側重點：

darknet 訓練過程中的快捷鍵和命令行

Ubuntu18.04安裝Tensorflow-gpu-1.9.0+Cuda10.0(GPU)+cuDNN7.4.1+Opencv-3.4.0+Matlab+Caffe

Ubuntu16.04安裝CUDA9.0+cuDNN7+python2.7+Tensorflow12+opencv3.4.0+Matlab2015b+Caffe(GPU)

Caffe 結構+使用+自定義層

韓松博士畢業論文（精簡總結）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結