神經網絡與深度學習(第一章)(二)

Perceptrons 感知器

What is a neural network? To get started, I’ll explain a type of artificial neuron called a perceptron. Perceptrons were developed in the 1950s and 1960s by the scientist Frank Rosenblatt, inspired by earlier work by Warren McCulloch and Walter Pitts. Today, it’s more common to use other models of artificial neurons - in this book, and in much modern work on neural networks, the main neuron model used is one called the sigmoid neuron. We’ll get to sigmoid neurons shortly. But to understand why sigmoid neurons are defined the way they are, it’s worth taking the time to first understand perceptrons. 
什麼是神經網絡?首先,我將介紹一種名爲感知器的人工神經元。感知器是二十世紀五六十年代由Frank Rosenblatt受Warren McCulloch和Walter Pitts工作的啓發研究出來的。現在本書和更現代的神經網絡中常用的是其他人工神經元——sigmoid神經元。我們很快就會介紹sigmoid神經元。但是首先花些時間去理解感知器可以更好的理解爲什麼這麼定義sigmoid神經元。

So how do perceptrons work? A perceptron takes several binary inputs, x1,x2,, and produces a single binary output: 
那麼感知器是如何工作的呢?一個感知器有多個二進制輸入x1,x2,,並且產生一個二進制輸出:

In the example shown the perceptron has three inputs, x1,x2,x3. In general it could have more or fewer inputs. Rosenblatt proposed a simple rule to compute the output. He introduced weights, w1,w2, real numbers expressing the importance of the respective inputs to the output. The neuron’s output, 0 or 1, is determined by whether the weighted sum jwjxj is less than or greater than some threshold value. Just like the weights, the threshold is a real number which is a parameter of the neuron. To put it in more precise algebraic terms:

output=01if jwjxj thresholdif jwjxj> threshold(1)

That’s all there is to how a perceptron works! 
在這個例子中,感知器有三個輸入x1,x2,x3。通常來說它會有更多或者更少的輸入。Rosenblatt設想了一個簡單的規則來計算輸出。他引入了實數權重w1,w2,,來表示每個輸入對於輸出的重要性。神經元的輸出0或者1是由加權和jwjxj的結果比閾值高或者低所決定的。這就是感知器所有的運算規則

That’s the basic mathematical model. A way you can think about the perceptron is that it’s a device that makes decisions by weighing up evidence. Let me give an example. It’s not a very realistic example, but it’s easy to understand, and we’ll soon get to more realistic examples. Suppose the weekend is coming up, and you’ve heard that there’s going to be a cheese festival in your city. You like cheese, and are trying to decide whether or not to go to the festival. You might make your decision by weighing up three factors: 
這是基礎的數學模型。你可以把感知器理解爲一種權衡各種因素來判決的設備。讓我們來看一個例子。它不是一個非常實際的例子,卻很容易理解,我們接下來很快將接觸更多實際的例子。假設週末就要到了,你聽說在你的城市將舉行一個奶酪節。你喜愛奶酪,然後嘗試決定是否去參加奶酪節。你將權衡以下三個因素來下決定:

  • Is the weather good?
  • 是不是天氣好
  • Does your boyfriend or girlfriend want to accompany you?
  • 你的男朋友或者女朋友是否願意陪你
  • Is the festival near public transit? (You don’t own a car).
  • 慶祝活動是否靠近公共交通系統?(你沒有汽車)

We can represent these three factors by corresponding binary variables x1,x2, and x3. For instance, we’d have x1=1 if the weather is good, and x1=0 if the weather is bad. Similarly, x2=1 if your boyfriend or girlfriend wants to go, and x2=0 if not. And similarly again for x3 and public transit. 
我們可以將這三個因素映射成二進制變量x1,x2x3。例如,如果天氣好則x1=1,天氣不好則x1=0。類似的,如果你男朋友或者女朋友願意去則x2=1,反之則x2=0。同樣x3 也可以根據公共交通系統情況取值。

Now, suppose you absolutely adore cheese, so much so that you’re happy to go to the festival even if your boyfriend or girlfriend is uninterested and the festival is hard to get to. But perhaps you really loathe bad weather, and there’s no way you’d go to the festival if the weather is bad. You can use perceptrons to model this kind of decision-making. One way to do this is to choose a weight w1=6 for the weather, and w2 and w3=2 for the other conditions. The larger value of w1 indicates that the weather matters a lot to you, much more than whether your boyfriend or girlfriend joins you, or the nearness of public transit. Finally, suppose you choose a threshold of 5 for the perceptron. With these choices, the perceptron implements the desired decision-making model, outputting 1 whenever the weather is good, and 0 whenever the weather is bad. It makes no difference to the output whether your boyfriend or girlfriend wants to go, or whether public transit is nearby. 
現在假設你十分喜愛奶酪,所以你十分樂意去參加奶酪節即使你男朋友或者女朋友對此不感興趣並且那裏很難過去。但是也許你十分討厭壞天氣,如果是壞天氣你壓根不想去。你可以使用感知器來爲這類決策建模。一種方法是爲天氣情況選擇權重w1=6,爲其他因素選擇權重w2 和 w3=2 。w1 比較大意味着天氣情況對你的影響遠大於男朋友或者女朋友是否陪你或者公共交通系統的遠近。最後,假設你爲感知器設置的閾值是5。這樣選擇之後,感知器執行決策模型,只要天氣好它就輸出1,只要天氣不好它就輸出0。輸出與男朋友或者女朋友是否願意去或者公共交通系統是否近無關。

By varying the weights and the threshold, we can get different models of decision-making. For example, suppose we instead chose a threshold of 3. Then the perceptron would decide that you should go to the festival whenever the weather was good or when both the festival was near public transit and your boyfriend or girlfriend was willing to join you. In other words, it’d be a different model of decision-making. Dropping the threshold means you’re more willing to go to the festival. 
通過修改權重和閾值,我們可以得到不同的決策模型。例如,假設我們取閾值爲3。那麼感知器將得出結論:如果天氣好或者奶酪節臨近公共交通系統且你的男朋友或者女朋友願意陪你去,那麼你就應該去參加奶酪節。換句話說,它是另外一個決策模型。降低閾值意味着你更願意去參加奶酪節。

Obviously, the perceptron isn’t a complete model of human decision-making! But what the example illustrates is how a perceptron can weigh up different kinds of evidence in order to make decisions. And it should seem plausible that a complex network of perceptrons could make quite subtle decisions: 
顯然,感知器不是一個人類決策的完整模型!不過這個例子說明的是感知器如何權衡不同種類因素來做出決策的。並且一個由感知器組成的複雜網絡似乎真的可以做出精準的決定。

In this network, the first column of perceptrons - what we’ll call the first layer of perceptrons - is making three very simple decisions, by weighing the input evidence. What about the perceptrons in the second layer? Each of those perceptrons is making a decision by weighing up the results from the first layer of decision-making. In this way a perceptron in the second layer can make a decision at a more complex and more abstract level than perceptrons in the first layer. And even more complex decisions can be made by the perceptron in the third layer. In this way, a many-layer network of perceptrons can engage in sophisticated decision making. 
在這個網絡中,第一列感知器——我們稱之爲感知器的第一層——是通過權衡輸入因素做三個十分簡單的決策。那麼第二層的感知器呢?第二層的每個感知器是通過權衡第一層每個感知器的輸出結果來做決策的。這個意義上第二層的感知器可以做出比第一次的感知器更加複雜和抽象的推斷。那麼第三層的感知器當然可以做出更爲複雜的決策。這個意義上多層感知器網絡可以做出複雜而老練的決策。

Incidentally, when I defined perceptrons I said that a perceptron has just a single output. In the network above the perceptrons look like they have multiple outputs. In fact, they’re still single output. The multiple output arrows are merely a useful way of indicating that the output from a perceptron is being used as the input to several other perceptrons. It’s less unwieldy than drawing a single output line which then splits. 
此外,在我之前定義感知器時我說一個感知器只有一個輸出。在上面的網絡中感知器看上去似乎有多個輸出。實際上,他們仍然只是一個輸出。多個輸出箭頭僅僅是表示感知器的輸出被用作其他感知器的輸入。這比畫一個輸出然後再分開顯得不那麼笨重。

Let’s simplify the way we describe perceptrons. The condition jwjxj>threshold is cumbersome, and we can make two notational changes to simplify it. The first change is to write jwjxjas a dot product, wxjwjxj, where w and x are vectors whose components are the weights and inputs, respectively. The second change is to move the threshold to the other side of the inequality, and to replace it by what’s known as the perceptron’s bias, bthreshold. Using the bias instead of the threshold, the perceptron rule can be rewritten: 
讓我們簡化描述感知器。jwjxj>threshold 看起來十分笨重,我們可以用兩個符號來簡化它。第一個變化是將jwjxj 寫作點積。第二個變化是將閾值移到等號的另一邊,並且用感知器的偏移量來代替它,bthreshold。用偏移量來代替閾值,感知規則可以重寫爲:

output={01if wx+b0if wx+b>0(2)

You can think of the bias as a measure of how easy it is to get the perceptron to output a 1. Or to put it in more biological terms, the bias is a measure of how easy it is to get the perceptron to fire. For a perceptron with a really big bias, it’s extremely easy for the perceptron to output a 1. But if the bias is very negative, then it’s difficult for the perceptron to output a 1. Obviously, introducing the bias is only a small change in how we describe perceptrons, but we’ll see later that it leads to further notational simplifications. Because of this, in the remainder of the book we won’t use the threshold, we’ll always use the bias. 
你可以把偏移量認爲是度量感知器輸出1的容易程度。或者把它放到生物學領域來看,偏移量是度量激活感知器的容易程度。對一個有巨大偏移量的感知器,它是十分容易輸出1的。但是如果偏移量是一個大負數,那麼感知器就很難輸出1。顯而易見,引入偏移量在我們描述感知器中只是小小的改變,但是後面我們將看到它會導致在表示上進一步簡化。因此,本書的其他部分我們不再使用閾值,我們通常使用偏移量。

I’ve described perceptrons as a method for weighing evidence to make decisions. Another way perceptrons can be used is to compute the elementary logical functions we usually think of as underlying computation, functions such as AND, OR, and NAND. For example, suppose we have a perceptron with two inputs, each with weight 2, and an overall bias of 3. Here’s our perceptron: 
我將感知器描述爲一種權衡因素來做決策的方法。另一方面感知器可以被用來計算那些我們通常認爲是基本運算的基本邏輯方程,如與, 或, 和與非。舉例來說,假設我們有個兩輸入的感知,每個輸入的權重是2,偏移量是3。這就是我們的感知器。

Then we see that input 00 produces output 1, since (2)0+(2)0+3=3 is positive. Here, I’ve introduced the  symbol to make the multiplications explicit. Similar calculations show that the inputs 01 and 10produce output 1. But the input 11 produces output 0, since (2)1+(2)1+3=1 is negative. And so our perceptron implements a NAND gate! 
我們會看見輸入00 將輸出1,因爲 (2)0+(2)0+3=3 是正的。這裏,我已經使用了 符號使得乘法更加顯著。類似的,計算表明輸入01 或 10 也會輸出1。但是輸入11 將輸出0,因爲 (2)1+(2)1+3=1 是負值。因此我們的感知器表示一個與非門。

The NAND example shows that we can use perceptrons to compute simple logical functions. In fact, we can use networks of perceptrons to compute any logical function at all. The reason is that the NAND gate is universal for computation, that is, we can build any computation up out of NAND gates. For example, we can use NAND gates to build a circuit which adds two bits, x1 and x2. This requires computing the bitwise sum, x1x2, as well as a carry bit which is set to 1 when both x1 and x2 are 1, i.e., the carry bit is just the bitwise product x1x2
這個與非門的例子告訴我們可以使用感知器來運算簡單的邏輯方程。事實上,我們可以使用感知器網絡來計算任何邏輯方程。理由是與非門是計算的通用配件,也就是我們可以用與非門組成任何邏輯運算。例如,我們可以使用與非門來組成一個運算兩比特x1 和 x2相加的迴路。這個需要按位相加,x1x2,同時當x1 和 x2 都爲 1 時進位也置1,進位比特是按位相乘。

To get an equivalent network of perceptrons we replace all the NAND gates by perceptrons with two inputs, each with weight 2, and an overall bias of 3. Here’s the resulting network. Note that I’ve moved the perceptron corresponding to the bottom right NAND gate a little, just to make it easier to draw the arrows on the diagram: 
爲了得到等效的感知器網絡,我們將與非門替換爲兩輸入的感知器,每個輸入的權重都是2,全局偏移量爲3。下面是構造的網絡。注意我已經將右下角與非門對應的感知器挪了挪,使得箭頭更容易畫。

One notable aspect of this network of perceptrons is that the output from the leftmost perceptron is used twice as input to the bottommost perceptron. When I defined the perceptron model I didn’t say whether this kind of double-output-to-the-same-place was allowed. Actually, it doesn’t much matter. If we don’t want to allow this kind of thing, then it’s possible to simply merge the two lines, into a single connection with a weight of -4 instead of two connections with -2 weights. (If you don’t find this obvious, you should stop and prove to yourself that this is equivalent.) With that change, the network looks as follows, with all unmarked weights equal to -2, all biases equal to 3, and a single weight of -4, as marked: 
這個感知器網絡一個值得注意的地方是最左邊感知器的輸出被作爲最右邊感知器的輸入使用了兩次。當我在定義感知器模型的時候,我並沒有說這種兩個到同一個地方的輸出是被允許的。事實上,它們無關緊要。如果我們不想使用這類表示,那麼也許就要簡單的將這兩條權重爲-2的線合併成一條權重爲-4的連接(如果你不能清楚的理解這事,你應該停下來自行證明這個是等效的)。通過這樣的變換,這個網絡變成了下面的樣子,所有未標註的權重等於-2,所有的偏移量等於3,只有一個權重被標記爲-4:

Up to now I’ve been drawing inputs like x1 and x2 as variables floating to the left of the network of perceptrons. In fact, it’s conventional to draw an extra layer of perceptrons - the input layer - to encode the inputs: 
到現在爲止我已經將輸入如x1 和 x2 畫成變量浮動在感知器網絡的左側。事實上,常規上它們被畫成一層額外的感知器——輸入層,來對輸入編碼。

This notation for input perceptrons, in which we have an output, but no inputs,

is a shorthand. It doesn’t actually mean a perceptron with no inputs. To see this, suppose we did have a perceptron with no inputs. Then the weighted sum jwjxj would always be zero, and so the perceptron would output 1 if b>0, and 0 if b0. That is, the perceptron would simply output a fixed value, not the desired value (x1, in the example above). It’s better to think of the input perceptrons as not really being perceptrons at all, but rather special units which are simply defined to output the desired values, x1,x2, 
這種用只有一個輸出而沒有輸入的符號表示輸入感知器是一種簡化。它並不真正代表一個感知器沒有輸入。爲了證明這點,我們假設有個感知器沒有輸入。那麼加權和jwjxj 將永遠是零,並且感知器的輸出要麼是1 如果 b>0,要麼是0如果 b>0。也就是說感知器將簡單的輸出固定的值而不是我們想要的值(在上面的例子中是x1)。更好的方式是並不將輸入感知器看作真正的感知器,而更像一個簡單定義爲輸出需要值的特殊單元,x1,x2,

The adder example how a network of perceptrons can be used to simulate a circuit containing many NAND gates. And because NAND gates are universal for computation, it follows that perceptrons are also universal for computation. 
這個加法器的例子演示瞭如何使用一個感知器網絡模擬一個多與非門的迴路。並且由於與非門是在計算中是通用的,因此感知器在計算中也是通用的。

The computational universality of perceptrons is simultaneously reassuring and disappointing. It’s reassuring because it tells us that networks of perceptrons can be as powerful as any other computing device. But it’s also disappointing, because it makes it seem as though perceptrons are merely a new type of NAND gate. That’s hardly big news! 
感知器在計算上的通用性既是令人安心的也是令人失望的。令人安心的地方在於它告訴我們感知器網絡可以和其他任何計算設備一樣強大。不過令人失望的地方在於似乎感知器僅僅是一種新的與非門。這可不是個好消息!

However, the situation is better than this view suggests. It turns out that we can devise learning algorithms which can automatically tune the weights and biases of a network of artificial neurons. This tuning happens in response to external stimuli, without direct intervention by a programmer. These learning algorithms enable us to use artificial neurons in a way which is radically different to conventional logic gates. Instead of explicitly laying out a circuit of NAND and other gates, our neural networks can simply learn to solve problems, sometimes problems where it would be extremely difficult to directly design a conventional circuit. 
但是情況比這個好的多。它證明我們可以設計自動優化人工神經元網絡的權重和偏置的學習算法。這種優化發生在應對外界刺激的反應中,而沒有程序員的直接干預。這種學習算法使得我們可以將人工神經元用於完全不同於傳統邏輯門的地方。替代設計一個與非門和其他邏輯門的迴路,我們的神經網絡可以簡單的學着解決問題,有時候問題是非常困難的直接設計一個傳統迴路。

轉自:http://blog.csdn.net/forrestyanyu/article/details/54705714

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章