EM算法 python初探

原創

youngAntitheist

2018-08-29 15:34

EM算法

在統計計算中，最大期望（EM）算法是在概率（probabilistic）模型中尋找參數最大似然估計或者最大後驗估計的算法，其中概率模型依賴於無法觀測的隱藏變量（Latent Variable）。最大期望經常用在機器學習和計算機視覺的數據聚類（Data Clustering）領域。

最大期望算法經過兩個步驟交替進行計算：

第一步是計算期望（E），利用概率模型參數的現有估計值，計算隱藏變量的期望；

第二步是最大化（M），利用E 步上求得的隱藏變量的期望，對參數模型進行最大似然估計。

M 步上找到的參數估計值被用於下一個 E 步計算中，這個過程不斷交替進行。

總體來說，EM的算法流程如下：

1.初始化分佈參數

2.重複直到收斂：

E步驟：估計未知參數的期望值，給出當前的參數估計。

M步驟：重新估計分佈參數，以使得數據的似然性最大，給出未知變量的期望估計。

樸素貝葉斯

代碼實現

import numpy as np
def em_algorithm(data, validnum, total, ep=1e-5):
    # validnum爲數據中有效的樣本數
    #total爲樣本總數，ep爲收斂精度
    valid_data = data[0:validnum]
    avg = np.sum(valid_data) / total  #avg爲隱變量的均值，theta爲隱變量的方差
    theta = np.sum(np.square(valid_data)) / total - avg
    while True:
        s1 = np.sum(valid_data) + avg * (total  - validnum)
        s2 = np.sum(np.square(valid_data)) + (avg * avg + theta) * (total  - validnum)
        new_avg = s1 / total
        new_theta = s2 / total  - new_avg * new_avg
        if new_avg - avg <= ep and new_theta - theta <= ep:
            break
        else:
            avg, theta = new_avg, new_theta
    return avg, theta


def generation(dtype1, dtype2, dtype3,latent_idx):
    # build NAIVE bayesian
    avg, var = [], []
    for idx in range(latent_idx):

        dim_type1, dim_type2, dim_type3 = dtype1[:, idx], dtype2[:, idx], dtype3[:,idx] # dim_type1 和 dim_type2 表示多維數據中的一維
        avg.append([np.average(dim_type1), np.average(dim_type2), np.average(dim_type3)])
        var.append([np.var(dim_type1), np.var(dim_type2), np.var(dim_type3)])

    em_avg_type1, em_var_type1 = em_algorithm(dtype1[:40, #用EM算法估計缺失值均值和方差
                                 latent_idx], 20, 40)
    em_avg_type2, em_var_type2 = em_algorithm(dtype2[:40,
                                 latent_idx], 20, 40)
    em_avg_type3, em_var_type3 = em_algorithm(dtype3[:40,
                                 latent_idx], 20, 40)
    avg.append([em_avg_type1, em_avg_type2, em_avg_type3]) # 估計得到均值和方差加入到數組
    var.append([em_var_type1, em_var_type2, em_avg_type3])
    return avg, var


def calc_gaussian(x, avg, var):
    # 高斯分佈函數
    t = 1.0 / np.sqrt(2 * np.pi * var)
    return t * np.exp(-np.square(x - avg) / (2.0 * var))


if __name__ == '__main__':
    data_str = open('iris.data').readlines()
    data_type1 = np.ndarray([50, 4])
    data_type2 = np.ndarray([50, 4])
    data_type3 = np.ndarray([50, 4])
    for idx in range(50):
        data_type1[idx] = data_str[idx].strip('\n').split(',')[0:4]
    for idx in range(50, 100):
        data_type2[idx - 50] = data_str[idx].strip('\n').split(',')[0:4]
    for idx in range(100, 150):
        data_type3[idx - 100] = data_str[idx].strip('\n').split(',')[0:4]
    a, v = generation(data_type1[:40], data_type2[:40], data_type3[:40], 3)


    # 構造測試數據集
    data_test = np.concatenate((data_type1[40:], data_type2[40:], data_type3[40:]))
    correct_times = 0
    for data_idx in range(len(data_test)):
        data = data_test[data_idx]
        val_type1, val_type2,val_type3 = 1/3,1/3, 1/3
        for idx in range(4):
            # 樸素貝葉斯計算
            val_type1 *= calc_gaussian(data[idx], a[idx][0], v[idx][0])
            val_type2 *= calc_gaussian(data[idx], a[idx][1], v[idx][1])
            val_type3 *= calc_gaussian(data[idx], a[idx][2], v[idx][2])
        # 前10條數據爲類型1，中間10條數據爲類型2，後10條爲類型3
        if val_type1 > val_type2 and val_type1 > val_type3 and data_idx < 10:
            correct_times += 1
        elif val_type1 < val_type2 and val_type2 > val_type3 and 20 > data_idx >= 10:
            correct_times += 1
        elif val_type3 > val_type2 and val_type3 > val_type1 and data_idx >= 20:
            correct_times += 1
        print("N0%2d, Iris_setosa: %f, Iris_versicolor: %f, Iris_virginica: %f"
              % (data_idx + 1, val_type1, val_type2,val_type3))
    print("Accuracy: %.1f%%" % (correct_times * 100/30))

運行結果：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

EM算法 python初探

EM算法

Adaboost python初探

基於yale人臉庫的人臉圖像檢測

關於SVM算法 python實現

Azure中通過模板部署批量創建虛擬機

TF-IDF介紹及Python實現文本聚類

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結