什麼是logits，softmax和softmax_cross_entropy_with_logits？

本文翻譯自：What is logits, softmax and softmax_cross_entropy_with_logits?

I was going through the tensorflow API docs here . 我在這裏瀏覽tensorflow API文檔。 In the tensorflow documentation, they used a keyword called logits . 在tensorflow文檔中，他們使用了一個名爲logits的關鍵字。 What is it? 它是什麼？ In a lot of methods in the API docs it is written like 在API文檔的很多方法中，它都是這樣編寫的

tf.nn.softmax(logits, name=None)

If what is written is those logits are only Tensors , why keeping a different name like logits ? 如果寫的是什麼是那些logits只有Tensors ，爲什麼保持一個不同的名稱，如logits ？

Another thing is that there are two methods I could not differentiate. 另一件事是我有兩種方法無法區分。 They were 他們是

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

What are the differences between them? 它們之間有什麼區別？ The docs are not clear to me. 這些文檔對我來說並不清楚。 I know what tf.nn.softmax does. 我知道tf.nn.softmax作用。 But not the other. 但不是另一個。 An example will be really helpful. 一個例子將非常有用。

#1樓

參考：https://stackoom.com/question/2JfZP/什麼是logits-softmax和softmax-cross-entropy-with-logits

#2樓

Logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. Logits只是意味着函數在早期圖層的未縮放輸出上運行，並且理解單位的相對比例是線性的。 It means, in particular, the sum of the inputs may not equal 1, that the values are not probabilities (you might have an input of 5). 特別是，它意味着輸入的總和可能不等於1，這些值不是概率（您可能輸入爲5）。

tf.nn.softmax produces just the result of applying the softmax function to an input tensor. tf.nn.softmax只生成將softmax函數應用於輸入張量的結果。 The softmax "squishes" the inputs so that sum(input) = 1 : it's a way of normalizing. softmax“s”“輸入，使sum(input) = 1 ：這是一種規範化的方法。 The shape of output of a softmax is the same as the input: it just normalizes the values. softmax的輸出形狀與輸入相同：它只是將值標準化。 The outputs of softmax can be interpreted as probabilities. softmax的輸出可以解釋爲概率。

a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508  0.205666    0.25120102  0.37474789]]

In contrast, tf.nn.softmax_cross_entropy_with_logits computes the cross entropy of the result after applying the softmax function (but it does it all together in a more mathematically careful way). 相反， tf.nn.softmax_cross_entropy_with_logits在應用softmax函數後計算結果的交叉熵（但它以更加數學上仔細的方式一起完成）。 It's similar to the result of: 它類似於以下結果：

sm = tf.nn.softmax(x)
ce = cross_entropy(sm)

The cross entropy is a summary metric: it sums across the elements. 交叉熵是一個彙總度量：它對元素進行求和。 The output of tf.nn.softmax_cross_entropy_with_logits on a shape [2,5] tensor is of shape [2,1] (the first dimension is treated as the batch). 形狀[2,5]張量上的tf.nn.softmax_cross_entropy_with_logits的輸出具有形狀[2,1] （第一維被視爲批處理）。

If you want to do optimization to minimize the cross entropy AND you're softmaxing after your last layer, you should use tf.nn.softmax_cross_entropy_with_logits instead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. 如果你想要做的優化，以儘量減少交叉熵，你是你的最後一層後softmaxing，你應該使用tf.nn.softmax_cross_entropy_with_logits ，而不是自己做的，因爲它涵蓋了數值不穩定的極端情況在數學上正確的方式。 Otherwise, you'll end up hacking it by adding little epsilons here and there. 否則，你最終會通過在這裏和那裏添加小ε來破解它。

Edited 2016-02-07: If you have single-class labels, where an object can only belong to one class, you might now consider using tf.nn.sparse_softmax_cross_entropy_with_logits so that you don't have to convert your labels to a dense one-hot array. 編輯2016-02-07：如果您有單類標籤，其中一個對象只能屬於一個類，您現在可以考慮使用tf.nn.sparse_softmax_cross_entropy_with_logits這樣您就不tf.nn.sparse_softmax_cross_entropy_with_logits標籤轉換爲密集標籤-hot數組。 This function was added after release 0.6.0. 在0.6.0版本之後添加了此功能。

#3樓

tf.nn.softmax computes the forward propagation through a softmax layer. tf.nn.softmax計算通過softmax層的前向傳播。 You use it during evaluation of the model when you compute the probabilities that the model outputs. 在計算模型輸出的概率時，在評估模型時使用它。

tf.nn.softmax_cross_entropy_with_logits computes the cost for a softmax layer. tf.nn.softmax_cross_entropy_with_logits計算softmax圖層的成本。 It is only used during training . 它僅在訓練期間使用。

The logits are the unnormalized log probabilities output the model (the values output before the softmax normalization is applied to them). logits是輸出模型的非標準化日誌概率 （在softmax標準化應用於它們之前輸出的值）。

#4樓

Short version: 精簡版：

Suppose you have two tensors, where y_hat contains computed scores for each class (for example, from y = W*x +b) and y_true contains one-hot encoded true labels. 假設您有兩個張量，其中y_hat包含每個類的計算得分（例如，從y = W * x + b）， y_true包含一個熱編碼的真實標籤。

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

If you interpret the scores in y_hat as unnormalized log probabilities, then they are logits . 如果將y_hat的分數解釋y_hat標準化日誌概率，則它們是logits 。

Additionally, the total cross-entropy loss computed in this manner: 此外，以這種方式計算的總交叉熵損失：

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits() : 基本上等於用函數softmax_cross_entropy_with_logits()計算的總交叉熵損失：

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

Long version: 長版：

In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b . 在神經網絡的輸出層中，您可能會計算一個數組，其中包含每個訓練實例的類別分數，例如計算y_hat = W*x + b 。 To serve as an example, below I've created a y_hat as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. 作爲一個例子，下面我創建了一個y_hat作爲2 x 3數組，其中行對應於訓練實例，列對應於類。 So here there are 2 training instances and 3 classes. 所以這裏有2個訓練實例和3個類。

import tensorflow as tf
import numpy as np

sess = tf.Session()

# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

Note that the values are not normalized (ie the rows don't add up to 1). 請注意，值未規範化（即行不加1）。 In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits ) and outputs normalized linear probabilities. 爲了對它們進行歸一化，我們可以應用softmax函數，該函數將輸入解釋爲非標準化日誌概率（也稱爲logits ）並輸出歸一化線性概率。

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

It's important to fully understand what the softmax output is saying. 完全理解softmax輸出的含義非常重要。 Below I've shown a table that more clearly represents the output above. 下面我展示了一個更清楚地代表上面輸出的表格。 It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. 可以看出，例如，訓練實例1爲“2級”的概率是0.619。 The class probabilities for each training instance are normalized, so the sum of each row is 1.0. 每個訓練實例的類概率都是標準化的，因此每行的總和爲1.0。

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3)
                    ,--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. 所以現在我們有每個訓練實例的類概率，我們可以採用每行的argmax（）來生成最終的分類。 From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1". 從上面，我們可以生成訓練實例1屬於“Class 2”並且訓練實例2屬於“Class 1”。

Are these classifications correct? 這些分類是否正確？ We need to measure against the true labels from the training set. 我們需要根據訓練集中的真實標籤進行衡量。 You will need a one-hot encoded y_true array, where again the rows are training instances and columns are classes. 您將需要一個熱門編碼的y_true數組，其中行也是訓練實例，列是類。 Below I've created an example y_true one-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3". 下面我創建了一個示例y_true one-hot數組，其中訓練實例1的真實標籤是“Class 2”，訓練實例2的真實標籤是“Class 3”。

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

Is the probability distribution in y_hat_softmax close to the probability distribution in y_true ? 是概率分佈y_hat_softmax接近的概率分佈y_true ？ We can use cross-entropy loss to measure the error. 我們可以使用交叉熵損失來測量誤差。

We can compute the cross-entropy loss on a row-wise basis and see the results. 我們可以逐行計算交叉熵損失並查看結果。 Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. 下面我們可以看到訓練實例1的損失爲0.479，而訓練實例2的損失則高達1.200。 This result makes sense because in our example above, y_hat_softmax showed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true ; 這個結果是有道理的，因爲在上面的例子中， y_hat_softmax顯示訓練實例1的最高概率是“Class 2”，它匹配y_true中的訓練實例1; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3". 然而，訓練實例2的預測顯示“1級”的概率最高，這與真實的“3級”級別不匹配。

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

What we really want is the total loss over all the training instances. 我們真正想要的是所有訓練實例的總損失。 So we can compute: 所以我們可以計算：

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

Using softmax_cross_entropy_with_logits() 使用softmax_cross_entropy_with_logits（）

We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits() function, as shown below. 我們可以使用tf.nn.softmax_cross_entropy_with_logits()函數計算總交叉熵損失，如下所示。

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

Note that total_loss_1 and total_loss_2 produce essentially equivalent results with some small differences in the very final digits. 請注意， total_loss_1和total_loss_2產生基本相同的結果，最後的數字有一些小的差異。 However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits() . 但是，您也可以使用第二種方法：它需要少一行代碼並積累較少的數值誤差，因爲softmax是在softmax_cross_entropy_with_logits()內爲您完成的。

#5樓

Above answers have enough description for the asked question. 以上答案對問題有足夠的描述。

Adding to that, Tensorflow has optimised the operation of applying the activation function then calculating cost using its own activation followed by cost functions. 除此之外，Tensorflow還優化了應用激活功能的操作，然後使用自己的激活和成本函數計算成本。 Hence it is a good practice to use: tf.nn.softmax_cross_entropy() over tf.nn.softmax(); tf.nn.cross_entropy() 因此，最好使用： tf.nn.softmax_cross_entropy()不是tf.nn.softmax(); tf.nn.cross_entropy() tf.nn.softmax(); tf.nn.cross_entropy()

You can find prominent difference between them in a resource intensive model. 您可以在資源密集型模型中找到它們之間的顯着差異。

#6樓

什麼是softmax是logit，這是J. Hinton一直在劇本視頻中重複。

什麼是logits，softmax和softmax_cross_entropy_with_logits？

#1樓

#2樓

#3樓

#4樓

#5樓

#6樓

是否有標準化的方法可以在Python中交換兩個變量？

爲什麼可變長度數組不屬於C ++標準？

如何確定.NET程序集是爲x86還是x64構建的？

如何整理整數除法的結果？

JDBC的連接池選項：DBCP與C3P0

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結