TSNE()參數解釋+使用方法+莫煩tensorflow CNN/TSNE可視化

TSNE即t-distributed Stochastic Neighbor Embedding.使用方法：

tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000); plot_only = 500 #只畫前500個點
#對中間層輸出進行tsne降維
low_dim_embs = tsne.fit_transform(flat_representation[:plot_only, :])
#數據經過tsne以後是二維的

#畫圖傳遞數據二維的，和真實類別

################################參數解釋###############################################

sklearn.manifold.TSNE函數定義如下：
class sklearn.manifold.TSNE(n_components=2, perplexity=30.0, early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000, n_iter_without_progress=30, min_grad_norm=1e-07, metric='euclidean', init='random',

verbose=0, random_state=None, method='barnes_hut', angle=0.5)
參數：
n_components：int，可選（默認值：2）嵌入式空間的維度。
perplexity：浮點型，可選（默認：30）較大的數據集通常需要更大的perplexity。考慮選擇一個介於5和50之間的值。由於t-SNE對這個參數非常不敏感，所以選擇並不是非常重要。
early_exaggeration：float，可選（默認值：4.0）這個參數的選擇不是非常重要。
learning_rate：float，可選（默認值：1000）學習率可以是一個關鍵參數。它應該在100到1000之間。如果在初始優化期間成本函數增加，則早期誇大因子或學習率可能太高。如果成本函數陷入局部最小的最小值，則學習速率有時會有所幫助。
n_iter：int，可選（默認值：1000）優化的最大迭代次數。至少應該200。
n_iter_without_progress：int，可選（默認值：30）在我們中止優化之前，沒有進展的最大迭代次數。
0.17新版功能：參數n_iter_without_progress控制停止條件。
min_grad_norm：float，可選（默認值：1E-7）如果梯度範數低於此閾值，則優化將被中止。
metric：字符串或可迭代的，可選，計算特徵數組中實例之間的距離時使用的度量。如果度量標準是字符串，則它必須是scipy.spatial.distance.pdist爲其度量標準參數所允許的選項之一，或者是成對列出的度量標準.PAIRWISE_DISTANCE_FUNCTIONS。如果度量是“預先計算的”，則X被假定爲距離矩陣。或者，如果度量標準是可調用函數，則會在每對實例（行）上調用它，並記錄結果值。可調用應該從X中獲取兩個數組作爲輸入，並返回一個表示它們之間距離的值。默認值是“euclidean”，它被解釋爲歐氏距離的平方。
init：字符串，可選（默認值：“random”）嵌入的初始化。可能的選項是“隨機”和“pca”。 PCA初始化不能用於預先計算的距離，並且通常比隨機初始化更全局穩定。
random_state：int或RandomState實例或None（默認）
僞隨機數發生器種子控制。如果沒有，請使用numpy.random單例。請注意，不同的初始化可能會導致成本函數的不同局部最小值。
method：字符串（默認：'barnes_hut'）
默認情況下，梯度計算算法使用在O（NlogN）時間內運行的Barnes-Hut近似值。 method ='exact'將運行在O（N ^ 2）時間內較慢但精確的算法上。當最近鄰的誤差需要好於3％時，應該使用精確的算法。但是，確切的方法無法擴展到數百萬個示例。0.17新版功能：通過Barnes-Hut近似優化方法。
angle：float（默認值：0.5）

僅當method ='barnes_hut'時才使用這是Barnes-Hut T-SNE的速度和準確性之間的折衷。 'angle'是從一個點測量的遠端節點的角度大小（在[3]中稱爲theta）。如果此大小低於'角度'，則將其用作其中包含的所有點的彙總節點。該方法對0.2-0.8範圍內該參數的變化不太敏感。小於0.2的角度會迅速增加計算時間和角度，因此0.8會快速增加誤差。

#################################莫煩TSNE可視化############################################

"""

Know more, visit my Python tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou

Dependencies:
tensorflow: 1.1.0
matplotlib
numpy
"""
# encoder結果使用tsne對結果進行降維，顯示
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt

tf.set_random_seed(1)
np.random.seed(1)

BATCH_SIZE = 50
LR = 0.001 # learning rate

mnist = input_data.read_data_sets('./mnist', one_hot=True) # they has been normalized to range (0,1)
test_x = mnist.test.images[:2000]
test_y = mnist.test.labels[:2000]

# plot one example
print(mnist.train.images.shape) # (55000, 28 * 28)
print(mnist.train.labels.shape) # (55000, 10)
plt.imshow(mnist.train.images[0].reshape((28, 28)), cmap='gray')
plt.title('%i' % np.argmax(mnist.train.labels[0])); plt.show()

tf_x = tf.placeholder(tf.float32, [None, 28*28]) / 255.
image = tf.reshape(tf_x, [-1, 28, 28, 1]) # (batch, height, width, channel)
tf_y = tf.placeholder(tf.int32, [None, 10]) # input y

# CNN
conv1 = tf.layers.conv2d( # shape (28, 28, 1)
inputs=image,
filters=16,
kernel_size=5,
strides=1,
padding='same',
activation=tf.nn.relu
) # -> (28, 28, 16)
pool1 = tf.layers.max_pooling2d(
conv1,
pool_size=2,
strides=2,
) # -> (14, 14, 16)
conv2 = tf.layers.conv2d(pool1, 32, 5, 1, 'same', activation=tf.nn.relu) # -> (14, 14, 32)
pool2 = tf.layers.max_pooling2d(conv2, 2, 2) # -> (7, 7, 32)
flat = tf.reshape(pool2, [-1, 7*7*32]) # -> (7*7*32, )
output = tf.layers.dense(flat, 10) # output layer

loss = tf.losses.softmax_cross_entropy(onehot_labels=tf_y, logits=output) # compute cost
train_op = tf.train.AdamOptimizer(LR).minimize(loss)

accuracy = tf.metrics.accuracy( # return (acc, update_op), and create 2 local variables
labels=tf.argmax(tf_y, axis=1), predictions=tf.argmax(output, axis=1),)[1]

sess = tf.Session()
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) # the local var is for accuracy_op
sess.run(init_op) # initialize var in graph

# following function (plot_with_labels) is for visualization, can be ignored if not interested
from matplotlib import cm
try: from sklearn.manifold import TSNE; HAS_SK = True
except: HAS_SK = False; print('\nPlease install sklearn for layer visualization\n')
def plot_with_labels(lowDWeights, labels):
plt.cla(); X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
for x, y, s in zip(X, Y, labels): #下邊是畫圖和顯示標籤
c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)
plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)

plt.ion()
for step in range(600):
b_x, b_y = mnist.train.next_batch(BATCH_SIZE)
_, loss_ = sess.run([train_op, loss], {tf_x: b_x, tf_y: b_y})
if step % 50 == 0:
accuracy_, flat_representation = sess.run([accuracy, flat], {tf_x: test_x, tf_y: test_y})
print('Step:', step, '| train loss: %.4f' % loss_, '| test accuracy: %.2f' % accuracy_)

if HAS_SK:
# Visualization of trained flatten layer (T-SNE)
tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000); plot_only = 500 #只畫前500個點
#對中間層輸出進行tsne降維
low_dim_embs = tsne.fit_transform(flat_representation[:plot_only, :])
#數據經過tsne以後是二維的
#畫圖傳遞數據二維的，和真實類別
labels = np.argmax(test_y, axis=1)[:plot_only]; plot_with_labels(low_dim_embs, labels)
plt.ioff()

# print 10 predictions from test data
test_output = sess.run(output, {tf_x: test_x[:10]})
pred_y = np.argmax(test_output, 1)
print(pred_y, 'prediction number')

print(np.argmax(test_y[:10], 1), 'real number')

#https://blog.csdn.net/lanchunhui/article/details/64923702?locationNum=11&fps=1 這個網址有tsne餘pca區別。

TSNE()參數解釋+使用方法+莫煩tensorflow CNN/TSNE可視化

tensorflow - mnist入門實例

Regression,model select,gradient descent，overfitting,regularization學習入門

python 垃圾回收機制

python 深拷貝與淺拷貝理解

機器學習,模型誤差分析,error,bias,variance

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結