有些來自於http://www.hankcs.com/nlp/ 講解更精簡，有很多學習資料

第一部分關於Softmax

第一個問題 a 是關於公式推導，驗證 softmax函數的常數不變性

第二部分 b 是實現代碼要求既能處理向量，也能處理矩陣（視作多個不相干的行向量集合）。

根據公式可能一開始會想當然的寫（我一開始是這樣以爲的///）

import numpy as np

def softmax(x):
"""Compute the softmax of vector x."""
    exp_x = np.exp(x)
    softmax_x = exp_x / np.sum(exp_x)
    return softmax_x

但實際上，遇到較大的數值向量時就有問題了。

這是由numpy中的浮點型數值範圍限制所導致的。當輸入一個較大的數值時，sofmax函數將會超出限制，導致出錯。

上面證明了softmax函數的常數不變性，所以運用這個性質。一般在實際運用中，通常設定c = - max(x)。

以下是基於矩陣和基於vector的實現

矩陣（行爲樣本，列爲標籤）

import numpy as np


def softmax(x):

    orig_shape = x.shape

    if len(x.shape) > 1:
        # Matrix 矩陣形式
        ### YOUR CODE HERE

        #找出最大值
        x -= np.max(x, axis=1, keepdims=True)   #(行方向 axis=1， 維度保持不變)
        x = np.exp(x) / np.sum(np.exp(x),axis=1) #歸一化，將其變成概率
        ### END YOUR CODE
    else:
        # Vector
        ### YOUR CODE HERE
        x_max = np.max(x, axis= 0 ,keepdims=True)
        x = x - x_max
        x = np.exp(x) / np.sum(np.exp(x),axis=0)
        
        ### END YOUR CODE

    assert x.shape == orig_shape
    return x

from q1_softmax import softmax
# return 20 if softmax(測試值) == 正確值 else 0


def test_softmax_basic():
    """
    Some simple tests to get you started.
    Warning: these are not exhaustive.
    """
    print ("Running basic tests...")
    test1 = softmax(np.array([1,2]))
    print (test1)
    ans1 = np.array([0.26894142,  0.73105858])
    assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06)

    test2 = softmax(np.array([[1001,1002],[3,4]]))
    print (test2)
    ans2 = np.array([
        [0.26894142, 0.73105858],
        [0.26894142, 0.73105858]])
    assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)

    test3 = softmax(np.array([[-1001,-1002]]))
    print (test3)
    ans3 = np.array([0.73105858, 0.26894142])
    assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06)

    print ("You should be able to verify these results by hand!\n")


def test_softmax():
    """
    Use this space to test your softmax implementation by running:
        python q1_softmax.py
    This function will not be called by the autograder, nor will
    your tests be graded.
    """
    print ("Running your tests...")
    ### YOUR CODE HERE
    raise NotImplementedError
    ### END YOUR CODE


if __name__ == "__main__":
    test_softmax_basic()
    # test_softmax()

另一段也是一樣的結果

import numpy as np


def softmax(x):
    """Compute the softmax function for each row of the input x.

    It is crucial that this function is optimized for speed because
    it will be used frequently in later code. You might find numpy
    functions np.exp, np.sum, np.reshape, np.max, and numpy
    broadcasting useful for this task.

    Numpy broadcasting documentation:
    http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

    You should also make sure that your code works for a single
    D-dimensional vector (treat the vector as a single row) and
    for N x D matrices. This may be useful for testing later. Also,
    make sure that the dimensions of the output match the input.

    You must implement the optimization in problem 1(a) of the
    written assignment!

    Arguments:
    x -- A D dimensional vector or N x D dimensional numpy matrix.

    Return:
    x -- You are allowed to modify x in-place
    """
    orig_shape = x.shape

    if len(x.shape) > 1:
        # Matrix 矩陣形式
        ### YOUR CODE HERE
        exp_minmax = lambda x: np.exp(x - np.max(x))
        denom = lambda x: 1.0 / np.sum(x)
        
        x = np.apply_along_axis(exp_minmax, 1, x)
        denominator = np.apply_along_axis(denom, 1, x)
        if len(denominator.shape) == 1:
            denominator = denominator.reshape((denominator.shape[0], 1))
        x = x * denominator
        ### END YOUR CODE
    else:
        # Vector
        ### YOUR CODE HERE
        x_max = np.max(x)
        x = x - x_max
        numerator = np.exp(x)
        denominator = 1.0 / np.sum(numerator)
        
        x = numerator.dot(denominator)
        ### END YOUR CODE

    assert x.shape == orig_shape
    return x

from q1_softmax import softmax
# return 20 if softmax(測試值) == 正確值 else 0


def test_softmax_basic():
    """
    Some simple tests to get you started.
    Warning: these are not exhaustive.
    """
    print ("Running basic tests...")
    test1 = softmax(np.array([1,2]))
    print (test1)
    ans1 = np.array([0.26894142,  0.73105858])
    assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06)

    test2 = softmax(np.array([[1001,1002],[3,4]]))
    print (test2)
    ans2 = np.array([
        [0.26894142, 0.73105858],
        [0.26894142, 0.73105858]])
    assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)

    test3 = softmax(np.array([[-1001,-1002]]))
    print (test3)
    ans3 = np.array([0.73105858, 0.26894142])
    assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06)

    print ("You should be able to verify these results by hand!\n")


def test_softmax():
    """
    Use this space to test your softmax implementation by running:
        python q1_softmax.py
    This function will not be called by the autograder, nor will
    your tests be graded.
    """
    print ("Running your tests...")
    ### YOUR CODE HERE
    raise NotImplementedError
    ### END YOUR CODE


if __name__ == "__main__":
    test_softmax_basic()
    # test_softmax()

softmax小結

1.指數變換去負數，突出特徵

2.歸一化變爲概率的近似

3.利用常數不變防溢出

4.每個維度代表的含義

5. axis = 0/1

第二部分神經網絡基礎

第一個問題 sigmod函數求導

第二個問題用softmax 函數實現一個交叉鞝損失函數的梯度求導

實現sigmod函數

#!/usr/bin/env python

import numpy as np


def sigmoid(x):
    """
    Compute the sigmoid function for the input here.

    Arguments:
    x -- A scalar or numpy array.

    Return:
    s -- sigmoid(x)
    """

    ### YOUR CODE HERE
    s = 1.0 / (1 + np.exp(-x))
    ### END YOUR CODE

    return s


def sigmoid_grad(s):
    """
    Compute the gradient for the sigmoid function here. Note that
    for this implementation, the input s should be the sigmoid
    function value of your original input x.

    Arguments:
    s -- A scalar or numpy array.

    Return:
    ds -- Your computed gradient.
    """

    ### YOUR CODE HERE
    ds = s * (1 - s)
    ### END YOUR CODE

    return ds

斯坦福cs224n assignment1

第一部分關於Softmax

第二部分神經網絡基礎

電子科技大學計算機科學與技術就讀體驗

Golang爬蟲代理接入的技術與實踐

數學必備知識

論文閱讀 | CenterNet：Keypoint Triplets for Object Detection

《統計學習方法》第二章感知機 Perceptron 總結及其代碼實現

斯坦福cs224n assignment1

Mac OS 運行opencv-python 顯示圖片手動關閉後，程序卡死問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

斯坦福cs224n assignment1

第一部分關於Softmax

第二部分 神經網絡基礎

第二部分神經網絡基礎