无监督学习（unsupervised learning） 1.线性方法

原創

2020-06-14 20:16

无监督学习（unsupervised learning） 1.线性方法

1 unspervised learning

Reduction(化繁为简)：Clustering & Dimension，只有输入
Generation(无中生有)：只有输出

2 Clustering

How many clusters？
K-Means：
- 将X={x1,x2,…,xN} 聚成K类
- 随机初始化聚类中心ci,i=1,2,…,K
- 对每一个xn ，计算它离每一个聚类中心的距离bin ，它离的最近的即为它的类
- 更新聚类中心：ci=∑xnbinxn/∑xnbin
- 重复以上几步
Hierarchical Agglomerative Clustering （HAC）
- step 1：build a tree，两两算相似度，相似度最大的两个合并，重复……
- step 2：pick a threshold，切分K类

3 dimension reduction

Distributed Representation：每个对象使用一个向量表示，而不仅仅是一个类
MNIST：描述一个数字不需要28*28的向量
Feature Selection：
Principle component analysis（PCA）： z=Wx ，线性降维
- 投影得到的z越大越好
- 投影到d维，w1,…,wd 相互正交，W=[w1,…,wd] 为正交矩阵
- z1=w1x,z¯1=w1x¯
- Var(z1)=∑z1(z1−z¯1)2=wT1∑(x−x¯)(x−x¯)Tw1=wT1Cov(x)w1=wT1Sw1
- 找到w1 使得 wT1Sw1 达到最大，且wT1w1=1
- 使用Lagrange multiplier：g(w1)=wT1Sw1−α(wT1w1−1) ，求偏导数得Sw1=αw1 ，w1 即为S的特征向量。wT1Sw1=α ，α 即为S的最大的特征值。
- 找到w1 使得 wT1Sw1 达到最大，且wT1w1=1,wT2w1=0
- ……解得β=0 ，w2 是第二大的特征值对应的特征向量。
- ……
- cov(z)=WSWT=[λ1e1,…,λKeK]

4 PCA——another point of view

x−x¯=c1u1+…+cKuK=x^
Reconstruction error：L=min{u1,…,uK}=∑||(x−x¯)−(∑k=1Kckuk)||2
SVD分解：Xm∗n=Um∗k∑k∗kVk∗n
LDA：考虑labelled data的降维（监督）
PCA的弱点：1、unsupervised；2、linear
需要多少principle components？

计算每个特征值的ratio

5 Non-negative matrix factorization

NMF非负矩阵分解，所有的参数和component均为非负
minimize error：XM∗N≈AM∗KBK∗N
- L=∑(i,j)(rirj−nij)2 ，不考虑缺失的数据
- 用于推荐系统(Recommender systems)
- →L=∑(i,j)(rirj+bi+bj−nij)2
- 应用：Latent Semantic Analysis 潜语义分析LSA
- 应用：Latent Dirichlet allocation 主题模型LDA

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

李宏毅学习笔记33.GAN.04.Theory behind GAN

文章目錄簡介MLEMLE=Minimize KL DivergenceGeneratorDiscriminatorD∗D^*D∗和divergence的關係證明GD Algorithm for GAN實作Algorithm for

2020-06-15 20:27:56

李宏毅学习笔记34.GAN.05.fGAN: General Framework of GAN

文章目錄簡介f-divergenceFenchel ConjugateConnection with GANMode CollapseMode Dropping問題分析解決Mode Collapse 簡介上節在講原文GAN的時候

2020-06-15 20:27:56

李宏毅学习笔记36.GAN.06.Feature Extraction

文章目錄簡介InfoGANWhat is InfoGAN?結果VAE-GAN具體算法BiGANAlgorithmTriple GANDomain-adversarial trainingFeature Disentangle 簡介

2020-06-15 20:27:56

李宏毅学习笔记35.GAN.06.Tips for Improving GAN

文章目錄簡介JS divergence來衡量分佈的問題What is the problem of JS divergence?Least Square GAN (LSGAN)Wasserstein GAN (WGAN): Ear

2020-06-15 20:27:56

半监督学习（semi-supervised learning）

# 半監督學習（semi-supervised learning） 1 introduction why semi-supervised learning? 收集數據很貴，收集有標籤的數據更貴！ superviesd：D

2020-06-14 20:16:14

无监督学习（unsupervised learning） 5.生成模型

無監督學習（unsupervised learning） 5.生成模型 1 PixelRNN 每次生成一個像素，下一個像素由之前所有的pixel決定應用：image、audio tips：每個像素用 1-of-N encod

2020-06-14 20:16:04

无监督学习（unsupervised learning） 2.词嵌入

無監督學習（unsupervised learning） 2.詞嵌入 Word Embedding 1-of-N Encoding：每一個詞用一個向量表示，該詞對應其中的一維 ↓ word class：詞分類 ↓ word

2020-06-14 20:16:04

李宏毅学习笔记34.GAN.04.Theory behind GAN

2020-05-09 14:14:07

李宏毅学习笔记35.GAN.05.fGAN: General Framework of GAN

2020-05-09 14:14:07

李宏毅学习笔记31.GAN.02.Conditional Generation by GAN

2020-05-06 04:15:58

李宏毅学习笔记30.GAN.01.Introduction of Generative Adversarial Network

2020-05-06 04:15:58

李宏毅学习笔记32.GAN.03.Unsupervised Conditional Generation

2020-05-06 04:15:58

李宏毅学习笔记29.Anomaly Detection

2020-04-29 14:51:45

李宏毅学习笔记28.MORE ABOUT AUTO-ENCODER

2020-04-29 14:51:45

李宏毅学习笔记19.Attack and Defense

2020-04-26 04:38:23

24小時熱門文章

前端使用 Konva 实现可视化设计器（13）- 折线 - 最优路径应用【思路篇】

最新文章

最新評論文章