Back to Simplicity: How to Train Accurate BNNs from Scratch?

文章目錄

Introduction

主要貢獻在於：

1：對於如何訓練出一個高精度的二值化網絡提供了具體的方式，說明了原來的一些方式的效果沒有那麼好

2：提出了設計BNNs的一些普適的準則，在此基礎上，提出了BinaryDenseNet

3：提供了開源的代碼

Related Work

最近的工作主要分爲三類：緊湊的網絡結構設計，量化權重的網絡，量化權重和激活值的網絡

Compact Network Design ：將 $3\times 3$ 的濾波器換成 $1\times 1$ 的濾波器，depth-wise separable convolution 、channel shuffling，不過這些方式都必須使用GPU，不能在CPU上加速

Quantized Weights and Real-valued Activations：BinaryConnect (BC) , Binary Weight Network (BWN) , and Trained Ternary Quantization (TTQ) ，內存減少，精度損失小，但是加速不多

Quantized Weights and Activations：DoReFa-Net, High-Order Residual Quantization (HORQ) and SYQ ，用1-bit的權值和多bit的激活值取得了較好的效果

Binary Weights and Activations：Binarized Neural Network (BNN) ，XNOR-Net ，ABC-Nets

Study on Common Techniques

Implementation of Binary Layers

用符號函數來二值化，然後使用STE進行反傳：
$\operatorname{sign}(x)=\left\{\begin{array}{l}{+1 \text { if } x \geq 0} \\ {-1 \text { otherwise }}\end{array}\right.\tag{1}$

$\begin{array}{c}{\text { Forward: } r_{o}=\operatorname{sign}\left(r_{i}\right)} \\ {\text { Backward: } \frac{\partial c}{\partial r_{i}}=\frac{\partial c}{\partial r_{o}} 1_{\left|r_{i}\right| \leq t_{\text {clip }}}}\end{array}\tag{2}$

Scaling Methods

作者經過試驗認識BN層已經包括了尺度放縮的效果，因此，尺度+BN的效果和單純的BN的效果是一樣的，因此，作者就不使用scaling factor。

Full-Precision Pre-Training

作者對比了訓練的三種方式，fully from scratch、by fine-tuning a fullprecision ResNetE18 with ReLU 、and clip as activation function。結果發現clip 的效果最差，from scratch的效果比用ReLU的效果稍微好一點，作者認爲是因爲BNN中我們並不使用ReLU，所以與訓練模型不太適用。

Backward Pass of the Sign Function
$\frac{\partial c}{\partial r_{i}}=\frac{\partial c}{\partial r_{o}} 1_{\left|r_{i}\right| \leq t_{\text {clip }}} \cdot\left\{\begin{array}{l}{2-2 r_{i} \text { if } r_{i} \geq 0} \\ {2+2 r_{i} \text { otherwise. }}\end{array}\right.\tag{3}$
這個好像在fine-tune的時候比較好使，一般情況下作用也不大。

Proposed Approach

Golden Rules for Training Accurate BNNs

核心是保留網絡中豐富的信息流 maintaining rich information flow of the network

不是是所有的real-value網絡都合適用來二值化，如一些緊湊型的網絡就不適合，因爲這兩種網絡的設計理念是互斥的，一個是較少冗餘eliminating redundancy，一個是補償信息的損失compensating information loss

少用Bottleneck design(bottleneck： $1\times 1$ 的卷積可以用於降維)

爲保存信息流，慎用全精度的降採樣層

使用shortcut connections 對BNNs來說尤爲重要

爲了克服信息流的瓶頸，應該適當增加網絡和寬度和深度

原來的scaling factor、approxsign、FP pre-training都沒有什麼用，可以直接從頭訓

考慮下BNN的缺點，理論上講，同全精度網絡相比，它的信息密度是低32倍的，因此需要用其他的方法來補償：

1：使用shortcut connection

2：減少bottlenecks

3：某些關鍵層還是用全精度代替

ResNetE

在resnet上面做了兩點改變，
1：刪去了bottleneck層，將三個濾波器(kernel size 1,3,1)變爲兩個 $3\times3$ 的濾波器。(會增加模型的大小個參數)
2：使用full precision downsampling convolution layer

BinaryDenseNet

既然用resnet有好的效果，作者就想試試densenet，因爲densenet中的shortcut比resnet更多。不過，在減少bottleneck層時，發現對densenet的效果並不好。作者說這是因爲the limited representation capacity of binary layers。解決這個問題有兩種方法，一個是增加the growth rate parameter k, which is the number of newly concatenated features from each layer。或者是用很多的blocks。

BinaryDenseNet和ResNetE 的另一個不同點在於降採樣層。也有兩種方案：一是使用全精度的降採樣層，爲了減少計算量，使用 $MaxPool\rightarrow ReLU\rightarrow \operatorname{1×1-Conv}$ 代替了 $\operatorname{1×1-Conv}\rightarrow \operatorname{AvgPool}$ 。另一種是使用binary downsampling conv-layer with a lower reduction rate, or even no reduction at all 代替full-precision layer。

Back to Simplicity How to Train Accurate BNNs from Scratch

Back to Simplicity: How to Train Accurate BNNs from Scratch?

文章目錄

Introduction

Related Work

Study on Common Techniques

Proposed Approach

Golden Rules for Training Accurate BNNs

ResNetE

BinaryDenseNet

Main Results

24-5-18 X

Forward and Backward Information Retention for Accurate Binary Neural Networks

Training Quantized Neural Networks with a Full-precision Auxiliary Module

Balanced Binary Neural Networks With Gated Residual

Noise injection and clamping estimation for neural network quantization

Ristretto Hardware-Oriented Approximation of Convolutional Neural Networks

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結