6.3 python ipython

在ipython notebook上完成

%matplotlib inline

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")


Anscombe's quartet

Anscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise.


解釋:Anscombe四重奏,用4組數據說明了畫圖的重要性。這四組數據均值,方差均相等,這將導致線性迴歸的結果與圖像完全不符。


Part 1

For each of the four datasets...

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

均值和方差:

調用groupby函數聚類,然後調用mean和var函數對每組的x和y求均值和方差

相關性:

沒找到好的函數,因此用了循環,獲取每組的10個數據,之後對每組求corr。

(這裏不知道爲什麼不能只用groupby函數,可能這個函數返回的對象不是原來的數據吧)

線性迴歸:

調用statsmodels.api中的OLS函數,但是沒找到如何單個輸出所求的東西的方法。

print("每組x的均值")
print(anascombe.groupby('dataset')['x'].mean())
print("\n每組x的方差")
print(anascombe.groupby('dataset')['x'].var())
print("\n每組y的均值")
print(anascombe.groupby('dataset')['y'].mean())
print("\n每組y的方差")
print(anascombe.groupby('dataset')['y'].var())


print("\n相關性")
for i in range(4):
    x = anascombe.x[i*10:(i+1)*10]
    y = anascombe.y[i*10:(i+1)*10]
    corrlation = x.corr(y)
    print("corrlation of group", i, ':', corrlation)
    print()

    
print("\n線性迴歸")
for i in range(4):
    x = anascombe.x[i*10:(i+1)*10]
    y = anascombe.y[i*10:(i+1)*10]
    mod = sm.OLS(y,x)
    result = mod.fit()
    print(result.summary())

結果如下

每組x的均值
dataset
I      9.0
II     9.0
III    9.0
IV     9.0
Name: x, dtype: float64

每組x的方差
dataset
I      11.0
II     11.0
III    11.0
IV     11.0
Name: x, dtype: float64

每組y的均值
dataset
I      7.500909
II     7.500909
III    7.500000
IV     7.500909
Name: y, dtype: float64

每組y的方差
dataset
I      4.127269
II     4.127629
III    4.122620
IV     4.123249
Name: y, dtype: float64

相關性
corrlation of group: 0 0.797081575906253

corrlation of group: 1 0.8107567988514719

corrlation of group: 2 0.828558301914895

corrlation of group: 3 0.4695259621639301


線性迴歸
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.965
Model:                            OLS   Adj. R-squared:                  0.962
Method:                 Least Squares   F-statistic:                     251.5
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           6.95e-08
Time:                        17:13:54   Log-Likelihood:                -18.061
No. Observations:                  10   AIC:                             38.12
Df Residuals:                       9   BIC:                             38.43
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.7881      0.050     15.859      0.000       0.676       0.901
==============================================================================
Omnibus:                        0.651   Durbin-Watson:                   2.507
Prob(Omnibus):                  0.722   Jarque-Bera (JB):                0.396
Skew:                          -0.424   Prob(JB):                        0.820
Kurtosis:                       2.519   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.961
Model:                            OLS   Adj. R-squared:                  0.957
Method:                 Least Squares   F-statistic:                     221.7
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           1.20e-07
Time:                        17:13:54   Log-Likelihood:                -18.584
No. Observations:                  10   AIC:                             39.17
Df Residuals:                       9   BIC:                             39.47
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.7894      0.053     14.889      0.000       0.669       0.909
==============================================================================
Omnibus:                        3.223   Durbin-Watson:                   2.351
Prob(Omnibus):                  0.200   Jarque-Bera (JB):                1.584
Skew:                          -0.969   Prob(JB):                        0.453
Kurtosis:                       2.795   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.963
Model:                            OLS   Adj. R-squared:                  0.959
Method:                 Least Squares   F-statistic:                     235.0
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           9.34e-08
Time:                        17:13:54   Log-Likelihood:                -18.117
No. Observations:                  10   AIC:                             38.23
Df Residuals:                       9   BIC:                             38.54
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.8175      0.053     15.329      0.000       0.697       0.938
==============================================================================
Omnibus:                        0.753   Durbin-Watson:                   1.401
Prob(Omnibus):                  0.686   Jarque-Bera (JB):                0.590
Skew:                          -0.489   Prob(JB):                        0.745
Kurtosis:                       2.323   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.964
Model:                            OLS   Adj. R-squared:                  0.960
Method:                 Least Squares   F-statistic:                     243.1
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           8.06e-08
Time:                        17:13:54   Log-Likelihood:                -17.121
No. Observations:                  10   AIC:                             36.24
Df Residuals:                       9   BIC:                             36.54
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.8537      0.055     15.591      0.000       0.730       0.978
==============================================================================
Omnibus:                        1.048   Durbin-Watson:                   1.199
Prob(Omnibus):                  0.592   Jarque-Bera (JB):                0.714
Skew:                          -0.287   Prob(JB):                        0.700
Kurtosis:                       1.823   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.


可以看到均值,方差,相關性都相等。線性迴歸的結果也相同。從Part2的練習就可以看到圖表的重要性了。

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

如hint所說的,調用這兩個函數即可。但是注意第一個形成Grid的時候需要使用dataset爲(標籤?)


g = sns.FacetGrid(anascombe, col = 'dataset')
g_map = g.map(plt.scatter, 'x', 'y')

結果如下



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章