神經網絡與深度學習(第一章)(三)

Sigmoid neurons Sigmoid神經元

Learning algorithms sound terrific. But how can we devise such algorithms for a neural network? Suppose we have a network of perceptrons that we’d like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned, handwritten image of a digit. And we’d like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we’d like is for this small change in weight to cause only a small corresponding change in the output from the network. As we’ll see in a moment, this property will make learning possible. Schematically, here’s what we want (obviously this network is too simple to do handwriting recognition!): 
學習算法聽起來恐怖。但是我們怎麼爲神經網絡產生這種算法呢?假設我們有一個感知器網絡,我們想要使用它來解決一些問題。例如,這個網絡的輸入也許是掃描手寫數字圖片得到的像素數據。我們想要這個網絡學習出權重和偏移量使得網絡的輸出可以正確的分類出數字。來看看學習算法是如何工作的,假設我們小小改變了一些網絡中的權重(或者偏移量)。我們想要得到的是對權重的小小改變僅僅在輸出產生一個對應的小小改變。正如這個時候我們看到的,這個性質使得學習成爲可能。大略上,這就是我們想要的(顯然這個網絡對於筆跡識別來說太簡單了!)。

If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. For example, suppose the network was mistakenly classifying an image as an “8” when it should be a “9”. We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a “9”. And then we’d repeat this, changing the weights and biases over and over to produce better and better output. The network would be learning. 
如果權重(或者偏移量)的微小改變僅僅使得輸出小小改變是真的,那麼我們可以利用這個性質來修改權重和偏移量使得我們的網絡表現出更多我們想要的性質。舉個例子,假設網絡錯誤的將一張“9”的圖片分類爲“8”了。我們可以指出如何小小改變權重和偏移量使得網絡更傾向於將這張圖片分類爲“9”。接下來我們不斷重複這個步驟,不斷調整權重和偏移量產生更好的結果。網絡就這樣學習了。

The problem is that this isn’t what happens when our network contains perceptrons. In fact, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from 0 to 1. That flip may then cause the behaviour of the rest of the network to completely change in some very complicated way. So while your “9” might now be classified correctly, the behaviour of the network on all the other images is likely to have completely changed in some hard-to-control way. That makes it difficult to see how to gradually modify the weights and biases so that the network gets closer to the desired behaviour. Perhaps there’s some clever way of getting around this problem. But it’s not immediately obvious how we can get a network of perceptrons to learn. 
問題是當我們的網絡中包含感知器的話這種情況就不會發生。事實上,網絡中某個感知器的權重和偏移量的微小改變有時候會使得感知器輸出徹底翻轉,也就是從0 變成1。基於某些複雜的情況,這個翻轉接下來使得網絡的其他部分產生根本性變化。因此這個“9”現在也許被正確歸類了,但是基於某種很難控制的情況,網絡識別其他圖片的結果很大可能上發生根本性轉變。這個導致發現如何逐漸修改權重和偏移量以使得網絡逼近設計性能。也許存在某種聰明的方法來解決這個問題,但是我們不能迅速清晰的學習出一個感知器網絡。

We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output. That’s the crucial fact which will allow a network of sigmoid neurons to learn. 
我們可以通過構造一種新的人工神經元sigmoid神經元來克服這個問題。sigmoid神經元和感知器非常像,不過權重和偏移量的微小改變只會使得它的輸出產生微小改變。這個關鍵特性使得sigmoid神經網絡可以學習。

Okay, let me describe the sigmoid neuron. We’ll depict sigmoid neurons in the same way we depicted perceptrons: 
好的,讓我們來描述一下sigmoid神經元。我們將用類似於描述感知器的方式來描述sigmoid神經元:

Just like a perceptron, the sigmoid neuron has inputs, x1,x2,But instead of being just 0 or 1, these inputs can also take on any values between 0 and 1. So, for instance, 0.638 is a valid input for a sigmoid neuron. Also just like a perceptron, the sigmoid neuron has weights for each input, w1,w2,, and an overall bias, b. But the output is not 0 or 1. Instead, it’s σ(wx+b), where σ is called the sigmoid function (Incidentally, σ is sometimes called the logistic function, and this new class of neurons called logistic neurons. It’s useful to remember this terminology, since these terms are used by many people working with neural nets. However, we’ll stick with the sigmoid terminology.), and is defined by: 
就像感知器,sigmoid神經元有多個輸入x1,x2,,但是並不只能輸入0或者1,可以輸入任何在01 之間的值。所以,如0.638 是sigmoid神經元可接受的輸入。同樣和感知器一樣,sigmoid神經元的每個輸入都有一個權重w1,w2, 和一個偏移量b。但是輸出不是0或者1。相反,輸出是σ(wx+b),這裏σ 被稱爲sigmoid函數(σ 有時候被稱爲logistic函數,同時這類新的神經元被稱爲logistic神經元。記住這個術語是非常有用的,因爲許多使用神經網絡的人在使用這些措辭。但是,我們堅持使用sigmoid的名稱),sigmoid定義爲: 

σ(z)11+ez.(3)

To put it all a little more explicitly, the output of a sigmoid neuron with inputs x1,x2,…x1,x2,…, weights w1,w2,…w1,w2,…, and bias bb is 
11+exp(jwjxjb).(4)

At first sight, sigmoid neurons appear very different to perceptrons. The algebraic form of the sigmoid function may seem opaque and forbidding if you’re not already familiar with it. In fact, there are many similarities between perceptrons and sigmoid neurons, and the algebraic form of the sigmoid function turns out to be more of a technical detail than a true barrier to understanding. 
第一眼看上去,sigmoid神經元和感知器差別甚大。sigmoid函數的代數表達式也許看起來複雜且可怕,如果你不是已經熟悉它的話。事實上,感知器和sigmoid神經元之間還是有許多相似的地方,並且相比它帶來的理解上的困難,sigmoid函數的代數表達式給出了更多的技術細節。

To understand the similarity to the perceptron model, suppose zwx+b is a large positive number. Then ez0 and so σ(z)1. In other words, when z=wx+b is large and positive, the output from the sigmoid neuron is approximately 1, just as it would have been for a perceptron. Suppose on the other hand that z=wx+b is very negative. Then ez, and σ(z)0. So when z=wx+b is very negative, the behaviour of a sigmoid neuron also closely approximates a perceptron. It’s only when wx+b is of modest size that there’s much deviation from the perceptron model. 
爲了理解和感知器模型的相似性,假設zwx+b 的結果是一個大的正數。那麼ez0 且σ(z)1。換句話說,當z=wx+b 很大且爲正時,sigmoid神經元的輸出近似於1,就像感知器的結果。另一方面假設z=wx+b 是大的負數。那麼ez,並且σ(z)0。因此當z=wx+b 是個大負數時,sigmoid神經元的輸出也近似於感知器。只有當wx+b 不那麼大時,它才和感知器的輸出有大的區別。

What about the algebraic form of σ? How can we understand that? In fact, the exact form of σ isn’t so important - what really matters is the shape of the function when plotted. Here’s the shape: 
σ 的代數表達式又是怎樣的呢?我們能怎麼理解它呢?事實上,σ 的精確形式並不那麼重要——重要的是它的圖像。這裏是它的圖像:

sigmoid

This shape is a smoothed out version of a step function: 
這是平滑過的階躍函數: 
step

If σ had in fact been a step function, then the sigmoid neuron would be a perceptron, since the output would be 1 or 0 depending on whether wx+b was positive or negative (Actually, when wx+b the perceptron outputs 0, while the step function outputs 1. So, strictly speaking, we’d need to modify the step function at that one point. But you get the idea.) . By using the actual σ function we get, as already implied above, a smoothed out perceptron. Indeed, it’s the smoothness of the σ function that is the crucial fact, not its detailed form. The smoothness of σ means that small changes Δwj in the weights and Δb in the bias will produce a small change Δoutput in the output from the neuron. In fact, calculus tells us that Δoutput is well approximated by 
如果σ 是個階躍函數,那麼sigmoid神經元將變成感知器,因爲根據wx+b 是正還是負,輸出不是1 就是0。(實際上,wx+b 使得感知器輸出0 時,階躍函數輸出1。因此,嚴格來說,我們在這一點上需要修改階躍函數。但是你理解就好)通過使用實際上的σ 函數,我們得到一個平滑輸出的感知器。更進一步,σ 函數的平滑性是關鍵因素,而不是它的具體形式。σ 函數的平滑性意味着權重的微小變化Δwj 和偏移量的微小變化Δb 將促使神經元的輸出產生微小變化Δoutput。事實上,微積分告訴我們Δoutput 近似於:

ΔoutputjoutputwjΔwj+outputbΔb,(5)

where the sum is over all the weights, wj, and output/wjand output/b denote partial derivatives of the output with respect to wj and b, respectively. Don’t panic if you’re not comfortable with partial derivatives! While the expression above looks complicated, with all the partial derivatives, it’s actually saying something very simple (and which is very good news): Δoutput is a linear function of the changes Δwj and Δb in the weights and bias. This linearity makes it easy to choose small changes in the weights and biases to achieve any desired small change in the output. So while sigmoid neurons have much of the same qualitative behaviour as perceptrons, they make it much easier to figure out how changing the weights and biases will change the output. 
這裏是對所有權重wj 求和,並且output/wj 和output/b 表示output 對wj 和b 的偏導數。不用爲偏導數而痛苦!當然上面的表達式看起來很複雜,用了偏導數,但是它事實上說的事情很簡單(這是一個非常好的消息):Δoutput 是Δwj 和Δb 的線性方程。這種線性特性使得很容易微調權重和偏置來一點點的改變輸出。既然sigmoid神經元和感知器有着非常類似的特性,這就使得推導如何改變權重和偏置來改變輸出變得十分簡單。

If it’s the shape of σ which really matters, and not its exact form, then why use the particular form used for σin Equation (3)? In fact, later in the book we will occasionally consider neurons where the output is f(wx+b)for some other activation function f(). The main thing that changes when we use a different activation function is that the particular values for the partial derivatives in Equation (5) change. It turns out that when we compute those partial derivatives later, using σ will simplify the algebra, simply because exponentials have lovely properties when differentiated. In any case, σ is commonly-used in work on neural nets, and is the activation function we’ll use most often in this book. 
既然σ 的圖像而不是表達式這麼重要,在等式(3)中爲什麼要使用σ 的精確表達式?實際上,在本書的後面章節我們偶爾纔會考慮在其他的激活函數f() 下神經元的輸出是f(wx+b)。當我們是使用不同的激活函數時,主要的改變是等式(5)的特定形式下的具體值。在我們計算這些特定形式的時候使用σ 將簡化表達式,這是因爲指數在微分時有着優秀的特性。無論如何,σ 在神經網絡中被廣泛使用,而且是本書中使用最多的激活函數。

How should we interpret the output from a sigmoid neuron? Obviously, one big difference between perceptrons and sigmoid neurons is that sigmoid neurons don’t just output 0 or 1. They can have as output any real number between 0 and 1, so values such as 0.173 and 0.689 are legitimate outputs. This can be useful, for example, if we want to use the output value to represent the average intensity of the pixels in an image input to a neural network. But sometimes it can be a nuisance. Suppose we want the output from the network to indicate either “the input image is a 9” or “the input image is not a 9”. Obviously, it’d be easiest to do this if the output was a 0 or a 1, as in a perceptron. But in practice we can set up a convention to deal with this, for example, by deciding to interpret any output of at least 0.5 as indicating a “9”, and any output less than 0.5 as indicating “not a 9”. I’ll always explicitly state when we’re using such a convention, so it shouldn’t cause any confusion. 
我們如何解讀sigmoid神經元的輸出呢?很明顯,感知器和sigmoid神經元之間的一個大不同是sigmoid神經元不是僅僅輸出0 或者1。它可以輸出01 之間的任意實數,例如 0.1730.689 都是合法的輸出。這可以是有用的,例如如果我們把一張圖片輸出神經網絡,我們想用輸出來表徵平均像素明暗度時。但是有時候這個確實討厭的。設想我們想從網絡輸出“輸入的圖片是9”或者“輸入的圖片不是9”。顯然如果輸出是0 或者1 這個很容易做到。但是在實踐中我們可以設置一個約定來處理這個情況,例如將輸出大於等於0.5 的解釋爲“是9”,輸出小於0.5 的解釋爲“不是9”。當我們用這樣的約定的時候我通常會明確說明,因此它不會造成任何混淆。

Exercises

Sigmoid neurons simulating perceptrons, part I

Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, c>0. Show that the behaviour of the network doesn’t change. 
設想我們把一個感知器網絡的所有權重和偏置都乘以一個正常數c>0。證明這個網絡的特性不會改變。

Sigmoid neurons simulating perceptrons, part II

Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won’t need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that wx+b0 for the input x to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c>0. Show that in the limit as c the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons. How can this fail when wx+b=0 for one of the perceptrons? 
設想我們和上面的問題有相同的設定——一個感知器網絡。設想網絡中所有感知器的輸入已經定好。我們不需要真的輸入值,我們僅僅需要輸入被確定。設想任意網絡中的感知器的權重和偏置對於輸入x滿足wx+b0。現在將網絡中的所有感知器替換爲sigmoid神經元,然後將所有的權重和偏置乘以一個正常數c>0。證明當c 這個sigmoid神經元網絡的性質和感知器網絡相同。只要一個感知器出現wx+b=0,這個都不會成立。

轉自:http://blog.csdn.net/forrestyanyu/article/details/54730408

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章