What is the role of the activation function in a neural network?

原創

2020-02-24 19:51

原文轉自：https://www.quora.com/What-is-the-role-of-the-activation-function-in-a-neural-network

The goal of (ordinary least-squares) linear regression is to find the optimal weights that -- when linearly combined with the inputs -- result in a model that minimizes the vertical offsets between the target and explanatory variables, but let's not get distracted by model fitting, which is a different topic ;).

So, in linear regression, we compute a linear combination of weights and inputs (let's call this function the "net input function").

net(x)=b+x1w1+x2w2+...xnwn = z

Next, let's consider logistic regression. Here, we put the net input z through a non-linear "activation function" -- the logistic sigmoid function where.

Think of it as "squashing" the linear net input through a non-linear function, which has the nice property that it returns the conditional probability P(y=1 | x) (i.e., the probability that a sample x belongs to class 1).

Now, if we add a step function, for instance,

If SigmoidOutput greater or equal 0.5 predict class 1, and class 0 otherwise

(Equivalently: if NetInput z greater or equal 0 -> predict class 1 and class 0 otherwise)

we get a logistic regression classifier:

(Maybe see this one for more details: Sebastian Raschka's answer to What is the probabilistic interpretation of regularized logistic regression? )

However, logistic regression (a generalized linear model) still remains a linear classifier in the sense that its decision surface is linear:

If classes can be linearly separated, this works fine, however, let's consider a trickier case:

Here, a non-linear classifier may be a better choice -- for example, a multi-layer neural network. Below, I trained a simple multi-layer perceptron with 1 hidden layer that consists of 200 of these logistic sigmoid activation functions. Let's see how the decision surface looks like now:

(note that I may be am overfitting a bit, but again, that's a discussion for a separate topic ;))

The architecture of this fully connected, feed-forward neural network, looks essentially like this:

In this particular case, we only have 3 units in the input layer (x_0 = 1 for the bias unit, and x_1 and x_2 for the 2 features, respectively); there are 200 of these sigmoid activation functions (a_m) in the hidden layer and 1 sigmoid function in the output layer, which is then squashed through a unit step function (not shown) to produce the predicted output class label y^.

To sum it up, the logistic regression classifier has a non-linear activation function, but the weight coefficients of this model are essentially a linear combination, which is why logistic regression is a "generalized" linear model. Now, the role of the activation function in a neural network is to produce a non-linear decision boundary via non-linear combinations of the weighted inputs.

(If you are interested, see Sebastian Raschka's answer to What is the best visual explanation for the back propagation algorithm for neural networks? for learning the weights in this case.)

For your convenience, I added a cheat sheet of the most common activation functions below:

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

What is the role of the activation function in a neural network?

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

批量修改文件（照片）名稱（格式）

Neural Network中的Activation function作用

shell常用操作

池化方法總結（Pooling）

Train and Test LeNet on your own dataset

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結