貝葉斯分類器

對於連續屬性而言,可以考慮使用概率密度函數(如果是離散的,直接數數即可)。
對於貝葉斯統計,有以下公式:
在這裏插入圖片描述

1)屬性連續的情況

舉例1:以下是小孩和成年人的數據,其中第一個數表示身高,第二個數表示體重,根據以下數據判斷新數據(120,120),(165,110)是成人還是小孩
在這裏插入圖片描述
首先,我們假設身高和體重是互不相關的,即獨立的影響判斷的結果,如此可以使用高斯分佈作用於樸素貝葉斯上,由於已經假設獨立同分布,所以一個類的似然等於類中每個屬性的似然乘積,如下公式:
在這裏插入圖片描述
解析:我們的目的是爲了求得後驗分佈,所以先計算先驗和似然
(1)直接使用計數求先驗
p(y=a)=4/(4+12)=0.25;p(y=c)=10.25=0.75p(y=a)=4/(4+12)=0.25; p(y=c)=1-0.25=0.75
(2)先使用data set將高斯分佈中的兩個參數,即均值和方差確定。
通過以下程序計算:

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 22 20:01:46 2018

@author: wudl
"""
import numpy as np
import xlrd

def mean_function(ob):  
    mean = sum(ob)/len(ob)
    return mean

def variance_function(ob):
    mean = mean_function(ob)
    ob_array = np.array(ob)
    variance = sum((ob_array-mean)**2)/len(ob)
    return variance
    
if __name__=="__main__":
    workbook = xlrd.open_workbook('C:/Users/Lenovo/Desktop/bayes.xlsx')
    sheet = workbook.sheet_by_name('Sheet1')
    height_c = sheet.row_values(0)
    weight_c = sheet.row_values(1)
    height_a = sheet.row_values(3)
    height_a = [i for i in height_a if i !='']   #使用列表解析式是爲了將列表中的空字符去掉
    weight_a = sheet.row_values(4)
    weight_a = [i for i in weight_a if i !='']
    mean_height_c = mean_function(height_c)
    variance_height_c = variance_function(height_c)
    mean_weight_c = mean_function(weight_c)
    variance_weight_c = variance_function(weight_c)
    mean_height_a = mean_function(height_a)
    variance_height_a = variance_function(height_a)
    mean_weight_a = mean_function(weight_a)
    variance_weight_a = variance_function(weight_a)
    print('mean_height_c ==>> %.2f;   variance_height_c ==>> %.2f' %(mean_height_c,variance_height_c),'\n' )
    print('mean_weight_c ==>> %.2f;   variance_weight_c ==>> %.2f' %(mean_weight_c,variance_weight_c),'\n' )
    print('mean_height_a ==>> %.2f;   variance_height_a ==>> %.2f' %(mean_height_a,variance_height_a),'\n' )
    print('mean_weight_a ==>> %.2f;   variance_weight_a ==>> %.2f' %(mean_weight_a,variance_weight_a),'\n' )

由此我們可以得到:

mean_height_c ==>> 59.17;   variance_height_c ==>> 424.31 

mean_weight_c ==>> 59.17;   variance_weight_c ==>> 424.31 

mean_height_a ==>> 170.00;   variance_height_a ==>> 50.00 

mean_weight_a ==>> 170.00;   variance_weight_a ==>> 50.00 

這裏近似四捨五入得到如下結果:
在這裏插入圖片描述
(3)計算後驗
在得到了均值和方差之後,就可以計算成人和小孩的先驗進而求得後驗
比如求成人身高的先驗
p(xhy=a)=1(2πσh,a2)exp((xhμh,a)22σh,a2)p(x_h|y=a)=\frac{1}{\sqrt(2\pi\sigma^2_{h,a})}exp(-\frac{(x_h-\mu_{h,a})^2}{2\sigma^2_{h,a}})
p(y=ax)=p(a)p(xy=a)p(x)p(y=a|x)=\frac{p(a)p(x|y=a)}{p(x)}
其中
p(xy=a)=p(xhy=a)p(xwy=a)p(x|y=a)=p(x_h|y=a)p(x_w|y=a)
p(x)=p(a)p(xy=a)+p(c)p(xy=c)=0.25p(x)=p(a)p(x|y=a)+p(c)p(x|y=c)=0.25*
在這裏插入圖片描述
完整程序:

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 22 20:01:46 2018

@author: wudl
"""
import numpy as np
import xlrd

def mean_function(ob):  
    mean = sum(ob)/len(ob)
    return mean

def variance_function(ob):
    mean = mean_function(ob)
    ob_array = np.array(ob)
    variance = sum((ob_array-mean)**2)/len(ob)
    return variance

def prior_distribution(w,h,ob1,ob2):
    prior_h = 1/np.sqrt(2*np.pi*variance_function(ob1))*np.exp(-(h-mean_function(ob1))**2/(2*variance_function(ob1)))
    prior_w = 1/np.sqrt(2*np.pi*variance_function(ob2))*np.exp(-(w-mean_function(ob2))**2/(2*variance_function(ob2)))
    return prior_h*prior_w

#def p_sum():
#    

if __name__=="__main__":
    height,weight = map(int,input('Enter height and weight(separated by space):').split())
    workbook = xlrd.open_workbook('C:/Users/Lenovo/Desktop/bayes.xlsx')
    sheet = workbook.sheet_by_name('Sheet1')
    height_c = sheet.row_values(0)
    weight_c = sheet.row_values(1)
    height_a = sheet.row_values(3)
    height_a = [i for i in height_a if i !='']
    weight_a = sheet.row_values(4)
    weight_a = [i for i in weight_a if i !='']
    mean_height_c = mean_function(height_c)
    variance_height_c = variance_function(height_c)
    mean_weight_c = mean_function(weight_c)
    variance_weight_c = variance_function(weight_c)
    mean_height_a = mean_function(height_a)
    variance_height_a = variance_function(height_a)
    mean_weight_a = mean_function(weight_a)
    variance_weight_a = variance_function(weight_a)
    print('mean_height_c ==>> %.2f;   variance_height_c ==>> %.2f' %(mean_height_c,variance_height_c),'\n' )
    print('mean_weight_c ==>> %.2f;   variance_weight_c ==>> %.2f' %(mean_weight_c,variance_weight_c),'\n' )
    print('mean_height_a ==>> %.2f;   variance_height_a ==>> %.2f' %(mean_height_a,variance_height_a),'\n' )
    print('mean_weight_a ==>> %.2f;   variance_weight_a ==>> %.2f' %(mean_weight_a,variance_weight_a),'\n' )
    
    "for forecasting"
    
    "prior distribution"
    prior_a = len(height_a)/(len(height_a)+len(height_c))
    prior_c = len(height_c)/(len(height_a)+len(height_c))
    "likelihood"
    like_a = prior_distribution(height,weight,height_a,weight_a)
    like_c = prior_distribution(height,weight,height_c,weight_c)
    
    "results"
    p_a = prior_a*like_a/(prior_a*like_a+prior_c*like_c)
    p_c = prior_c*like_c/(prior_a*like_a+prior_c*like_c)
    print(p_a)
    print('p(y=c|x)==>>%.4f' %p_c)
'''
    print('p(y=a|x)==>>'+str(p_a))      #考慮到四捨五入有時候比較小時總是會變爲零,所以採用字符型輸出
    print('p(y=c|x)==>>'+str(p_c))
'''

輸入(120 120)得到:判定是小孩
在這裏插入圖片描述
輸入(165 110)得到:判定是小孩
在這裏插入圖片描述

2) 離散型(實際上只要數數相乘即可,以後更新)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章