當我們開始學習編程的時候,第一件事往往是學習打印”Hello World”。就好比編程入門有Hello World,機器學習入門有MNIST
主要步驟
- 獲取數據
- 建立模型
- 定義 tensor,variable:X,W,b
- 定義損失函數,優化器:cross-entropy,gradient descent
- 訓練模型:loop,batch
- 評價:準確率
一.獲取MINIST 數據集
- 數據來自於http://yann.lecun.com/exdb/mnist/
- 數據分爲train, validate, test三部分
from tensorflow.examples.tutorials.mnist import input_data
minist = input_data.read_data_sets("MINIST_data/",one_hot=True)
print(minist)
##
Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f02cbc0cd30>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f02df872518>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f02e1188a90>)
print(np.shape(minist.test.images))
print(np.shape(minist.test.labels))
print(np.shape(minist.train.images))
print(np.shape(minist.train.labels))
print(np.shape(minist.validation.images))
print(np.shape(minist.validation.labels))
###對應數據集
(10000, 784)
(10000, 10)
(55000, 784)
(55000, 10)
(5000, 784)
(5000, 10)
- 可以看到訓練數據是(10000,784) ,對應的標籤是(10000,10)
每一張圖片包含28*28個像素點,故用一個數字數組來表示這張圖片,我們把這個數組展開成一個向量,長度是 28x28 = 784. - 相對應的MNIST數據集的標籤是介於0到9的數字,標籤數據是”one-hot vectors”。 一個one-hot向量除了某一位的數字是1以外其餘各維度數字都是0。比如,標籤0將表示成([1,0,0,0,0,0,0,0,0,0,0])
目標:給了 X 後,預測它的 label 是屬於 0~9 類中的哪一類
如果想要看數據屬於多類中的哪一類,首先可以想到用 softmax 來做。
二.建立模型
softmax regression 有兩步:
- 把 input 轉化爲某類的 evidence
把 evidence 轉化爲 probabilities
1. 把 input 轉化爲某類的 evidence
- 某一類的 evidence 就是像素強度的加權求和,再加上此類的 bias。
- 如果某個 pixel 可以作爲一個 evidence證明圖片不屬於此類,則 weight 爲負,否則的話 weight 爲正。 下圖中,紅色代表負值,藍色代表正值:
2. 把 evidence 轉化爲 probabilities
簡單看,softmax 就是把 input 先做指數,再做一下歸一:
歸一的作用:好理解,就是轉化成概率的性質
爲什麼要取指數:《常用激活函數比較》http://www.jianshu.com/p/22d9720dbf1a
第一個原因是要模擬 max 的行爲,所以要讓大的更大。
第二個原因是需要一個可導的函數。
用圖片表示:
用公式表示:
代碼表示:
##實現迴歸模型
y = tf.nn.softmax(tf.matmul(x,w) + b)
三.定義 tensor 和 variable:
x = tf.placeholder(tf.float32,[None,784])
w = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
三.定義損失函數\優化器
這裏採用成本函數是“交叉熵”(cross-entropy)。
y 是預測的概率分佈, y’ 是實際的分佈(我們輸入的one-hot vector)。
##訓練模型
y_ = tf.placeholder("float",[None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
然後用 backpropagation, 且 gradient descent 作爲優化器,來訓練模型,使得 loss 達到最小:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
四.訓練模型
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
五.評價模型
###評估模型
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
print(sess.run(accuracy,feed_dict={x:minist.test.images,y_: minist.test.labels}))
完整代碼:
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Functions for downloading and reading MNIST data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gzip
import os
import tempfile
import numpy as np
from six.moves import urllib
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
##導入數據
from tensorflow.examples.tutorials.mnist import input_data
minist = input_data.read_data_sets("MINIST_data/",one_hot=True)
##實現迴歸模型
x = tf.placeholder(tf.float32,[None,784])
w = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,w) + b)
##訓練模型
y_ = tf.placeholder("float",[None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = minist.train.next_batch(100)
sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys})
###評估模型
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
print(sess.run(accuracy,feed_dict={x:minist.test.images,y_: minist.test.labels}))