這一節主要是使用softmax實現一個手寫數字的識別器,難點主要是在代價函數和梯度的矢量化寫法。
STEP 2: Implement softmaxCost
function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)
% numClasses - the number of classes
% inputSize - the size N of the input vector
% lambda - weight decay parameter
% data - the N x M input matrix, where each column data(:, i) corresponds to
% a single test set
% labels - an M x 1 matrix containing the labels corresponding for the input data
%
% Unroll the parameters from theta
theta = reshape(theta, numClasses, inputSize);
numCases = size(data, 2);
groundTruth = full(sparse(labels, 1:numCases, 1));
cost = 0;
thetagrad = zeros(numClasses, inputSize);
%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute the cost and gradient for softmax regression.
% You need to compute thetagrad and cost.
% The groundTruth matrix might come in handy.
m = theta * data;
m = bsxfun(@minus,m,max(m,[],1));
m = exp(m);
m = bsxfun(@rdivide, m, sum(m));
cost = -sum(sum(groundTruth .* log(m)))/size(data,2) + lambda/2*sum(sum(theta.^2));
thetagrad = -(groundTruth - m)*data'/size(data,2) + lambda*theta;
% ------------------------------------------------------------------
% Unroll the gradient matrices into a vector for minFunc
grad = [thetagrad(:)];
end
其中groundTruth的大小爲numClasses * numCases的矩陣,groundTruth橫座標表示的是類別,縱座標表示的是第幾個樣本,如果第j樣本屬於i類,那麼groundTruth[i][j]=1,其他groundTruth[x][j]=0(x!=i)
則根據公式
容易知道
groundTruth .* log(m)就等價於
理解了這個就不難得到cost的公式至於thetagrad,根據公式
-(groundTruth - m)*data'得到了一個新的矩陣,它的大小跟theta是一樣,它的第j行就等價於下面的公式,因爲groundTruth - m的size爲numClasses*numCases,data的size爲inputSize*numCases,所以(groundTruth - m)*data'就有一個將所有樣本累加的作用(想象一下兩個矩陣相乘是怎麼樣的就容易明白了)
STEP 3: Gradient checking
STEP 4&5 :Learning parameters && Testing
softmaxPredict.m
function [pred] = softmaxPredict(softmaxModel, data)
% softmaxModel - model trained using softmaxTrain
% data - the N x M input matrix, where each column data(:, i) corresponds to
% a single test set
%
% Your code should produce the prediction matrix
% pred, where pred(i) is argmax_c P(y(c) | x(i)).
% Unroll the parameters from theta
theta = softmaxModel.optTheta; % this provides a numClasses x inputSize matrix
pred = zeros(1, size(data, 2));
%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute pred using theta assuming that the labels start
% from 1.
m = theta * data;
[~,pred] = max(m);
% ---------------------------------------------------------------------
end
接下來運行它提供的訓練和測試代碼即可,記得要把DEBUG設爲false