Linear Regression with TensorFlow

原文傳送門：點擊打開鏈接

TensorFlow除了可以用於一些基本的深度學習算法（*NN)之類，當然也可以用於最簡單的線性迴歸。畢竟以後我們所有接觸到的如Logistics Regression, Neural Network 等都是以最基本的線性迴歸（Linear Regression）爲基礎的。本篇主要從簡單的線性迴歸來展示運用TensorFlow工具做模型的一般過程。

前一節我們提到TensorFlow其最大的特點便是把Tensor圖結構的定義和執行分開。任何時候變量和函數的定義與執行總是分開的。這種類Lisp語言風格的方式擴展性特別好，可以自由組合各種Tensor節點，特別適合新模型和算法的探索。具體來說，針對Machine Learning算法而言，一個更爲細化的Tensor定義和執行的過程主要包括：數據讀取、數據可視化於模型選擇、placeholder定義、Variable定義、構建模型、定義損失函數、定義Optimizer、變量初始化、訓練模型、輸出模型參數，預測等。下面我們繼續以Stanford的CS20Si課程的數據集爲基礎，來展示利用TensorFlow做線性迴歸的整個過程，並對整個步驟極可能詳細的描述，以便爲後來的LR，CNN等過程鋪路。

1.數據集獲取與預處理

數據集使用Cengage Learning提供的芝加哥大都會區住宅盜竊與火災的統計數據，數據集描述如下：

In the following data pairs
X = fires per 1000 housing units
Y = thefts per 1000 population
within the same Zip code in the Chicago metro area
Reference: U.S. Commission on Civil Rights

我們下載excel版本的數據集到本地後，先讀取數據到TensorFlow

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import xlrd

# local data file directory
DATA_FILE = "data/fire_theft.xls"

# STEP 0: read in data from xls file
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
lst = [sheet.row_values(i) for i in range(1, sheet.nrows)]
data = np.asarray(lst)
n_samples = sheet.nrows - 1

2.數據集抽樣可視化與模型選擇

模型和算法的選擇在任何時候都是至關重要的過程，而對數據集有一個直觀的印象更是重中之重 Intuitive在模型特徵的選擇時尤爲重要，有效的數據可視化往往使得接下來的工作事半功倍。本文中的數據集只是簡單的二維數據集，可視化自然不成問題，對於高緯數據，往往需要一個降維的過程，如基本的PCA，或者是hinton大神的t-SNE，其在Mnist高緯數據集的可視化效果非常贊：
同時推薦一個非常不錯的Machine Learning領域的數據可視化博客Colah's Blog。。對該數據集我們可以畫一個簡單的二維數據分佈圖：

# STEP: 1: plot the data
X_axis, Y_axis = data.T[0], data.T[1]
plt.plot(X_axis, Y_axis, 'bo', label='Real data')
plt.ylim([0, 150])
plt.xlim([0, 45])
plt.legend()
plt.show()

分佈如下所示：

大致來看，數據集分佈可以用一個簡單的線性迴歸來適配，也可以整一個Quadratic函數來建模，當然此處爲了演示線性迴歸，便先用線性迴歸來做。我們簡單的選取模型爲

Y = weight * X + bias

3.定義placeholder

既然模型確定了，我們便可以用TensorFlow的語言來定義模型圖結構。placeholder是一種預先聲明的function用來填充訓練數據集的。對於線性迴歸而言，placeholder就是指X,Y，是來自測試數據集的，其不需要進行初始化，也不需要進行模型訓練，只是在session run時feed真實的數據即可。

# STEP 2: create placeholder for input X(number of fire) and label Y(number of theft)
X = tf.placeholder(tf.float32, name="X")
Y = tf.placeholder(tf.float32, name="Y")

4. 定義Variable

相對於placeholder，Variable略爲不同，前面一篇我們單獨介紹過Variable的一些operation和屬性。我們可以和placeholder來做個簡單的對比，

Name	placeholder	Variable
type	function	Class
初始化	NO	YES
模型訓練更新	NO	YES
存儲	數據集羣	參數服務器（PS）

Stack Overflow上有個關於兩者區別的討論值得一看。

# STEP 3: create Variables(weight and bias here), initialize to 0.
w = tf.Variable(0.0, name="weight")
b = tf.Variable(0.0, name="bias")

5. 構建模型

這步很簡單，依據第二部我們的模型選擇，把其轉換成Python表達，值得注意的是，由於是矩陣運算，如果X是高緯特徵需要稍微考慮各個元素之間的位置。

# STEP 4: construct model to predict Y
Y_Predict = X * w + b

6. 定義損失函數

損失函數（loss function)的選擇直接影響模型的訓練複雜度和泛化效果。如針對linear regression最常見得損失函數式mean square error loss，即平方差損失。這種損失函數簡單易用，但由於針對所有樣本進行損失累計，一些outlier的點往往對整體模型有很大的影響，類似缺點的損失函數如absolute loss The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of samples, the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions. -wikipedia。還有一種叫做hubor loss的損失函數，相比於square error，其通過對偏離mean的點進行一定程度的loss懲罰（把損失近似成線性），使得其對噪聲樣本（如離均值比較遠的點）比較多的數據集有很好的效果。其和mean loss對比圖一目瞭然：

# STEP 5: define loss function(use square error here)
# hubor_loss = tf.losses.huber_loss(Y, Y_Predict) for Hubor loss.
loss = tf.square(Y - Y_Predict, name="loss")

7. 定義Optimizer

一個算法模型最終能否轉化成實際可用的industry工程實踐，很大程度上區別於在現有computer power下的優化算法優劣。機器學習的很多模型都是non-determinate的算法，損失函數也可能是non-convex的，比如可能存在local optimization的問題，這時候優化方法便至關重要。TensorFlow提供了一大坨Optimizers，比如：

每一個算法都值得講好幾章有麼有，以後有時間會一一研究下，有篇總結的帖子可以參考下。簡而言之，梯度下降有兩個問題，1）可能會陷入局部最優。2）依賴於learning rate的設置而這兩點在針對一些特定數據集（如stock market的稀疏數據）的條件下可能會導致模型找不到（或者在一定的epoch內）找不到最優解。這裏對於簡單的mean square loss，一般選擇梯度下降就夠了。

# STEP 6: define optimizer(here we use Gradient Descent with learning rate of 0.001)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

8. 初始化與模型訓練

到這裏我們的基本模型定義已經完成，便需要設定一些基本的常量，或者對之前的Variable進行初始化。前一章節也大篇幅介紹了初始化的幾種方法，再次不再贅述。初始化完畢便可以開始模型訓練。模型定義完成後，模型的執行也基本是一行代碼的事，唯一需要做的便是在Session裏run時進行feed數據。

# define the epoch
epoch = 200
init = tf.initialize_all_variables()
# execute the model
with tf.Session() as sess:
    # STEP 7: initailize the necessary variable, in this case w and b.
    sess.run(init)

    # STEP 8: traning the model
    for i in range(epoch):
        for x, y in data:
            sess.run(optimizer, feed_dict={X: x, Y: y})

    # STEP 9: output the values of w and b.
    w_value, b_value = sess.run([w, b])

9. 效果評估與預測

訓練出來的模型適配原始的數據集如下所示，第一幅圖是採用square error的線，第二幅是採用hubor loss的線，可以看出對於outlier點有了很好的規避。

# plot the predict line.
plt.plot(Y_axis, Y_axis * w_value + b_value, 'r', label='Predicted data')
plt.ylim([0, 150])
plt.xlim([0, 45])
plt.legend()
plt.show()

參考

[2] Tensorflow for Deep Learning Research.

Linear Regression with TensorFlow

1.數據集獲取與預處理

2.數據集抽樣可視化與模型選擇

3.定義placeholder

4. 定義Variable

5. 構建模型

6. 定義損失函數

7. 定義Optimizer

8. 初始化與模型訓練

9. 效果評估與預測

參考

一鍵自動化博客發佈工具,用過的人都說好(頭條篇)

關於在 linux(opensuse)下爲 firefox3.5裝 adobe flash player

grub2重新寫入mbr

在Windows7 下製作freebsd usb啓動盤

在grub2下通過硬盤安裝opensuse

淺談L0,L1,L2範數及其應用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結