How to implement a neural network Intermezzo 2
This page is part of a 5 (+2) parts tutorial on how to implement a simple neural network model. You can find the links to the rest of the tutorial here:
Softmax classification function
This intermezzo will cover:
- The softmax function
- Cross-entropy cost function
The previous intermezzo described how to do a classification of 2 classes with the help of the logistic function . For multiclass classification there exists an extension of this logistic function called the softmax function which is used in multinomial logistic regression . The following section will explain the softmax function and how to derive it.
Softmax function
The logistic output function described in the previous
intermezzo can only be used for the classification between two target classes
The denominator
We can write the probabilities that the class is
Where
These probabilities of the output
# Define the softmax function
def softmax(z):
return np.exp(z) / np.sum(np.exp(z))
Derivative of the softmax function
To use the softmax function in neural networks, we need to compute its derivative. If we define
Note that if
Cross-entropy cost function for the softmax function
To derive the cost function for the softmax function we start out from the likelihood
function that a given set of parameters
The likelihood
Since we are not interested in the probability of
As was noted during the derivation of the cost function of the logistic function, maximizing this likelihood can also be done by minimizing the negative log-likelihood:
Which is the cross-entropy error function
The cross-entropy error function over a batch of multiple samples of size
Where
Derivative of the cross-entropy cost function for the softmax function
The derivative
Note that we already derived
The result that
This post at peterroelants.github.io is generated from an IPython notebook file. Link to the full IPython notebook file