6、python邏輯迴歸代碼案例實現

邏輯迴歸(Logistic Regression)

    針對因變量爲分類變量而進行迴歸分析的一種統計方法,屬於概率性非線性迴歸。
    
    優點:算法容易實現和部署,執行效率和準確度高。
    
   缺點:離散類型的自變量數據需要通過生成虛擬變量的額方法來使用
   
  
2 公式對比

線性迴歸方程

y=a1x1+a2x2+....+anxn

Sigmoid函數(Sigmoid Function)

g(x)=1/(1+e^{-x})

3、虛擬變量

    啞變量和離散特徵編碼,可以用來表示分類變量、非數量因素可能產生的影響。
    
    離散特徵的取值之間有大小的意義 ,例如:尺寸(L、XL、XXL)
    
    離散特徵的取值之間沒有大小的意義,例如:顏色(red,Blue,Green)

模塊實現: pandas.Series.map(dict)

    離散特徵的取值之間有大小意義的處理函數。
    
    參數說明:
    
    dict  映射的字典

4、代碼案例實現

import pandas

data=pandas.read_csv('D:\\DATA\\pycase\\number2\\4.4\\Data.csv')

# 1 進行數據質量的分析(缺失值、異常值、一致性分析)基本描述,檢查空值

data.describe()

# 此處邏輯迴歸模型,此外數據量足夠大,使用清除方法

data=data.dropna()

data.shape


# 2 數據變換
# 對離散特徵進行虛擬變量處理
# 分開爲後續預測做蒲地奧,直接調用

dummyColumns=[
       'Gender', 'Home Ownership', 
    'Internet Connection', 'Marital Status',
    'Movie Selector', 'Prerec Format', 'TV Signal'
    ]

# 將邏輯變量進行類型轉換

for column in dummyColumns:
    data[column]=data[column].astype('category')
    
dummiesData=pandas.get_dummies(
        data,
        columns=dummyColumns,
        prefix_sep=" ",
        drop_first=True
        )

# 以性別爲例,通過去重查看處理效果,查看某列屬性的方法,兩種,“。”和【】

dummiesData.columns

data.Gender.unique()

data['Gender'].unique()

dummiesData['Gender Male'].unique()

"""
博士後    Post-Doc
博士      Doctorate
碩士      Master's Degree
學士      Bachelor's Degree
副學士    Associate's Degree
專業院校  Some College
職業學校  Trade School
高中      High School
小學      Grade School
"""
# 有大小離散特徵的轉化

educationLevelDict = {
    'Post-Doc': 9,
    'Doctorate': 8,
    'Master\'s Degree': 7,
    'Bachelor\'s Degree': 6,
    'Associate\'s Degree': 5,
    'Some College': 4,
    'Trade School': 3,
    'High School': 2,
    'Grade School': 1
}

# 增加數值變量

dummiesData['Education Level Map']=dummiesData['Education Level'].map(educationLevelDict)

freqMap = {
    'Never': 0,
    'Rarely': 1,
    'Monthly': 2,
    'Weekly': 3,
    'Daily': 4
}
dummiesData['PPV Freq Map'] = dummiesData['PPV Freq'].map(freqMap)
dummiesData['Theater Freq Map'] = dummiesData['Theater Freq'].map(freqMap)
dummiesData['TV Movie Freq Map'] = dummiesData['TV Movie Freq'].map(freqMap)
dummiesData['Prerec Buying Freq Map'] = dummiesData['Prerec Buying Freq'].map(freqMap)
dummiesData['Prerec Renting Freq Map'] = dummiesData['Prerec Renting Freq'].map(freqMap)
dummiesData['Prerec Viewing Freq Map'] = dummiesData['Prerec Viewing Freq'].map(freqMap)

dummiesData.columns

# 選取特徵值

dummiesSelect = [
    'Age', 'Num Bathrooms', 'Num Bedrooms', 'Num Cars', 'Num Children', 'Num TVs', 
    'Education Level Map', 'PPV Freq Map', 'Theater Freq Map', 'TV Movie Freq Map', 
    'Prerec Buying Freq Map', 'Prerec Renting Freq Map', 'Prerec Viewing Freq Map', 
    'Gender Male',
    'Internet Connection DSL', 'Internet Connection Dial-Up', 
    'Internet Connection IDSN', 'Internet Connection No Internet Connection',
    'Internet Connection Other', 
    'Marital Status Married', 'Marital Status Never Married', 
    'Marital Status Other', 'Marital Status Separated', 
    'Movie Selector Me', 'Movie Selector Other', 'Movie Selector Spouse/Partner', 
    'Prerec Format DVD', 'Prerec Format Laserdisk', 'Prerec Format Other', 
    'Prerec Format VHS', 'Prerec Format Video CD', 
    'TV Signal Analog antennae', 'TV Signal Cable', 
    'TV Signal Digital Satellite', 'TV Signal Don\'t watch TV'
]

inputData = dummiesData[dummiesSelect]

# 選取結果值

outputData = dummiesData[['Home Ownership Rent']]


# 導入邏輯迴歸的方法


from sklearn import linear_model

lrModel = linear_model.LogisticRegression()

lrModel.fit(inputData, outputData)

lrModel.score(inputData, outputData)


## 數據預測準備,需要對數據進行同樣的標準化處理纔可以進行預測

newData = pandas.read_csv(
    'D:\\DATA\\pycase\\number2\\4.4\\newData.csv', 
    encoding='utf8'
)

# 變量轉換需要和樣本的準換類型相一致一致

for column in dummyColumns:
    newData[column] = newData[column].astype(
        'category', 
        categories=data[column].cat.categories
    )

newData = newData.dropna()

# 直接調用樣本的方法

newData['Education Level Map'] = newData['Education Level'].map(educationLevelDict)

newData['PPV Freq Map'] = newData['PPV Freq'].map(freqMap)
newData['Theater Freq Map'] = newData['Theater Freq'].map(freqMap)
newData['TV Movie Freq Map'] = newData['TV Movie Freq'].map(freqMap)
newData['Prerec Buying Freq Map'] = newData['Prerec Buying Freq'].map(freqMap)
newData['Prerec Renting Freq Map'] = newData['Prerec Renting Freq'].map(freqMap)
newData['Prerec Viewing Freq Map'] = newData['Prerec Viewing Freq'].map(freqMap)

dummiesNewData = pandas.get_dummies(
    newData, 
    columns=dummyColumns,
    prefix=dummyColumns,
    prefix_sep=" ",
    drop_first=True
)

inputNewData = dummiesNewData[dummiesSelect]

lrModel.predict(inputData)


 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章