python第19次作業(14周)-Anscombe's quartet

提要:配置jupyter notebook可以參考
https://blog.csdn.net/red_stone1/article/details/72858962
下載的csv文件如果出現問題,可以參考:
https://jingyan.baidu.com/album/c843ea0b9a641477931e4a89.html?picindex=2

%matplotlib inline

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")

Anscombe’s quartet

Anscombe’s quartet comprises of four datasets, and is rather famous. Why? You’ll find out in this exercise.

anascombe = pd.read_csv('data/anscombe.csv')
anascombe.head()

解釋:

  1. pandas.read_csv
    讀取CSV(逗號分割)文件到DataFrame,也支持文件的部分導入和選擇迭代。
    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
    參數整理:(很詳細)
    https://www.cnblogs.com/datablog/p/6127000.html

  2. anscombe.head()
    使用函數head( m )來讀取查看前m條數據,如果沒有參數m,默認讀取前五條數據。

Output:
這裏寫圖片描述

Part 1

For each of the four datasets…

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

計算均值mean和方差variance:

print(anascombe.groupby('dataset')['x'].mean())
print(anascombe.groupby('dataset')['y'].mean())

解釋:

  1. grouby可以對傳入的參數進行分組
    https://blog.csdn.net/leonis_v/article/details/51832916

  2. pandas.DataFrame.mean()
    返回數據的平均值
    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html

  3. pandas.DataFrame.var()
    返回數據的方差
    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.var.html

Output:
這裏寫圖片描述
這裏寫圖片描述

計算x和y的相關係數:

#4組
X1 = anascombe.x[0:10].values
X2 = anascombe.x[11:21].values
X3 = anascombe.x[22:32].values
X4 = anascombe.x[33:43].values
Y1 = anascombe.y[0:10].values
Y2 = anascombe.y[11:21].values
Y3 = anascombe.y[22:32].values
Y4 = anascombe.y[33:43].values
coefficients = [0,0,0,0] 
coefficients[0] = sp.stats.pearsonr(X1, Y1)[0] #第一個返回值
coefficients[1] = sp.stats.pearsonr(X2, Y2)[0]  
coefficients[2] = sp.stats.pearsonr(X3, Y3)[0]  
coefficients[3] = sp.stats.pearsonr(X4, Y4)[0] 
for coefficient in coefficients:
    print(coefficient)

解釋:

  1. 先手動獲取每組數據的x和y
  2. pandas.DataFrame.values()
    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html

  3. scipy.stats.pearsonr(x, y)
    計算相關係數
    https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html#scipy.stats.pearsonr

Output:
這裏寫圖片描述

計算線性迴歸方程

#1
X_I = sm.add_constant(X1)     
model_I = sm.OLS(Y1, X_I)  
result_I = model_I.fit()  
params_I = result_I.params  
print("I: y =", params_I[0], "+", params_I[1], "x")  
#2
X_II = sm.add_constant(X2)  
model_II = sm.OLS(Y2, X_II)  
result_II = model_II.fit()  
params_II = result_II.params  
print("II: y =", params_II[0], "+", params_II[1], "x")  
#3
X_III = sm.add_constant(X3)  
model_III = sm.OLS(Y3, X_III)  
result_III = model_III.fit()  
params_III = result_III.params  
print("III: y =", params_III[0], "+", params_III[1], "x")  
#4
X_IV = sm.add_constant(X4)  
model_IV = sm.OLS(Y4, X_IV)  
result_IV = model_IV.fit()  
params_IV = result_IV.params  
print("IV: y =", params_IV[0], "+", params_IV[1], "x")  

解釋:
參考:https://blog.csdn.net/cymy001/article/details/78364652
Output:
這裏寫圖片描述

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

sns.set(style='whitegrid')  
gr = sns.FacetGrid(anascombe, col="dataset", hue="dataset", size=3)  
gr.map(plt.scatter, 'x', 'y')  
plt.show()

參考:官網及https://blog.csdn.net/yutao03081/article/details/79064669

Output:
這裏寫圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章