激活函數hard sigmoid

原創

2019-09-16 05:24

The standard sigmoid is slow to compute because it requires computing the exp() function, which is done via complex code (with some hardware assist if available in the CPU architecture). In many cases the high-precision exp() results aren’t needed, and an approximation will suffice. Such is the case in many forms of gradient-descent/optimization neural netwotks: the exact values aren’t as important as the “ballpark” values, insofar as the results are comparable with small error.

Here’s a plot of the sigmoid, “ultra-fast” sigmoid and the “hard” sigmoid:
Note how the sigmoid (blue) is smooth, while the ultra-fast (green) and hard (red) sigmoids are linear piece-wise. In fact, these approximations are computed as linear interpolations between pairs of cut-points. Note the green line plot, that touches the blue one at a few points forming a set of line segments. Computing the results of this approximation is significantly faster than calling a routine implementing the sigmoid via exp() and division: all it requires is determining in which linear segment x lies and doing a simple interpolation. The approximation is just that: approximate, but the errors are low enough that many ANN algorithms run fine with the approximation.

For the hard sigmoid there are less cut points: in fact there are only 2 and therefore only two comparisons are required to ascertain in which segment the result lies and only one interpolation is required, for the central segment, as the other two segments are constant 0 and constant 1; in other words: it’s very fast. The error is larger than for the ultra-fast sigmoid, but depending on your particular case it might not change significantly the numerical results. In fact, for classification problems it rarely if ever causes errors (and when it does, some more training tends to correct it–extra training you can afford to run because your training cycles run so much faster than with the standard sigmoid).

As an added detail, you can see the effect of using piecewise interpolation when computing the sigmoid as a form of regularization, which in the right circumstances helps a lot with the creation of useful feature detectors. Just don’t use the more extreme approximations (like the hard sigmoid) when your problem is function approximation–your errors will reduce slowly, and might plateau before reaching your goal. But, again, if you’re doing classification it’s usually quite OK.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

激活函數hard sigmoid

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

cs04 CSS Measurement Units

CSC研修報告模板下載網址

SciPy中兩個模塊：io 和misc

跑深度學習代碼在linux服務器上的常用操作(ssh,screen, tensorboard,jupyter)

WinEdt + CTex編譯latex

kitti dataset depth ground truth

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結