1.4.3 無監督學習

一個經典的無監督學習任務是找到數據的“最佳”表示,“最佳”可以是不同的表示,但是一般來說是指該表示在比本事表示的信息更簡單或者更容易訪問而受到一些懲罰或者限制的情況下,儘可能多的保存關於x的信息。

有很多方式定義較簡單的表示,常見的三種有低維表示,稀疏表示和獨立表示

主成分分析(PCA)

線性代數一章說過,一種降維的手段

k均值據類(k-mean)

k-均值聚類算法將訓練集分爲k個靠近彼此的不同樣本聚類,因此該算法提供了k維的one-hot編碼向量以表示輸入x。當x屬於據類i時,有hi=1, h的其它項爲0

k-mean有k個不同的中心點,然後迭代交換兩個不同的步驟直到收斂。

  1. 每個訓練樣本分配到最近的中心點所代表的聚類i
  2. 每個中心點跟新爲聚類i中所有訓練樣本x的均值
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import tensorflow as tf

def loadData():
    iris = datasets.load_iris()
    X=iris.data
    y=iris.target
    return X,y

def kmeansCluster(X,numClusters):
    get_inputs=lambda: tf.train.limit_epochs(tf.convert_to_tensor(X, dtype=tf.float32), num_epochs=1)
    # 加載模型
    cluster = tf.contrib.factorization.KMeansClustering(num_clusters=numClusters,
                                                      initial_clusters=tf.contrib.factorization.KMeansClustering.KMEANS_PLUS_PLUS_INIT)
    cluster.train(input_fn=get_inputs, steps=2000)  # 訓練
    y_pred=cluster.predict_cluster_index(input_fn=get_inputs)  # 預測
    y_pred=np.asarray(list(y_pred))
    return y_pred

def plotFigure(fignum,title, X,y):
    fig = plt.figure(fignum, figsize=(8,6))
    ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)
    ax.scatter(X[:, 3], X[:, 0], X[:, 2],
               c=y.astype(np.float), edgecolor='k')
    ax.w_xaxis.set_ticklabels([])
    ax.w_yaxis.set_ticklabels([])
    ax.w_zaxis.set_ticklabels([])
    ax.set_xlabel('Petal width')
    ax.set_ylabel('Sepal length')
    ax.set_zlabel('Petal length')
    ax.set_title(title)
    ax.dist = 10
    fig.show()

if __name__ == '__main__':
    X,y = loadData()
    y_pred = kmeansCluster(X,3)
    plotFigure(1,"3 clusters",X,y_pred)
    plotFigure(2,"Ground Truth",X,y)

INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: C:\Users\egbert\AppData\Local\Temp\tmp27idx79j
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\egbert\\AppData\\Local\\Temp\\tmp27idx79j', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001F520FD3BA8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\egbert\AppData\Local\Temp\tmp27idx79j\model.ckpt.
INFO:tensorflow:Saving checkpoints for 1 into C:\Users\egbert\AppData\Local\Temp\tmp27idx79j\model.ckpt.
INFO:tensorflow:Loss for final step: None.
WARNING:tensorflow:Input graph does not use tf.data.Dataset or contain a QueueRunner. That means predict yields forever. This is probably a mistake.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\egbert\AppData\Local\Temp\tmp27idx79j\model.ckpt-1
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


D:\Anaconda\lib\site-packages\matplotlib\figure.py:459: UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure
  "matplotlib is currently using a non-GUI backend, "

png

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章