本文彙總了近年來關於Deep-Learning Processor領域的相關論文。
2014
ASPLOS
Chen, Tianshi, et al. “DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.” architectural support for programming languages and operating systems (2014): 269-284.
MICRO
Chen, Yunji, et al. “DaDianNao: A Machine-Learning Supercomputer.” international symposium on microarchitecture (2014): 609-622.
2015
ISCA
Du, Zidong, et al. “ShiDianNao: shifting vision processing closer to the sensor.” international symposium on computer architecture (2015): 92-104.
ASPLOS
Liu, Daofu, et al. “PuDianNao: A Polyvalent Machine Learning Accelerator.” architectural support for programming languages and operating systems (2015): 369-381.
2016
MICRO
Zhang, Shijin, et al. “Cambricon-x: an accelerator for sparse neural networks.” international symposium on microarchitecture (2016): 1-12.
Han, Song, et al. “EIE: efficient inference engine on compressed deep neural network.” international symposium on computer architecture (2016): 243-254
ISCA
Liu, Shaoli, et al. “Cambricon: an instruction set architecture for neural networks.” international symposium on computer architecture (2016): 393-405.
Chen, Yuhsin, Joel Emer, and Vivienne Sze. “Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks.” international symposium on computer architecture (2016): 367-379.
ISSCC
Chen, Yuhsin, et al. “14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.” international solid-state circuits conference (2016): 262-263.
Sim, Jaehyeong, et al. “14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems.” international solid-state circuits conference (2016): 264-265.
VLSI
Moons, Bert, and Marian Verhelst. “A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets.” 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). IEEE, 2016.
2017
ISSCC
Desoli, Giuseppe, et al. “14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems.” international solid-state circuits conference (2017): 238-239.
Shin, Dongjoo, et al. “14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks.” international solid-state circuits conference (2017): 240-241.
Whatmough, Paul N., et al. “14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.” international solid-state circuits conference (2017): 242-243.
Price, Michael, James Glass, and Anantha P. Chandrakasan. “14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating.” international solid-state circuits conference (2017): 244-245.
Moons, Bert, et al. “14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI.” international solid-state circuits conference (2017): 246-247.
Bang, Suyoung, et al. “14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence.” international solid-state circuits conference (2017): 250-251.
ISCA
Jouppi, Norman P., et al. “In-Datacenter Performance Analysis of a Tensor Processing Unit.” international symposium on computer architecture (2017): 1-12.
VLSI
Yin, Shouyi, et al. “A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications.” symposium on vlsi circuits (2017).
2018
ISSCC
Lee, Jinmook, et al. “UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision.” 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2018.
Hot chips
ARM’s First Generation ML Processor. (ARM)
The NVIDIA Deep Learning Accelerator. (NVIDIA)