如君愿，开门见山，直入主题吧！

1 t检验

1.1 单样本t检验

对总体均值的假设检验
单样本 t 检验是最基础的假设检验，利用来自总体的样本数据，推断总体均值于假设的检验值之间是否存在显著差异，是对总体均值的假设检验。
四步骤：
1、原假设：总体均值 = U0 ; 备择假设：总体均值！= U0。
2、计算样本均值、标准差。
3、计算 t 统计量、P值。
4、根据P值，做出决策。

#R实现：
##单样本t检验
>t.test(rate,mu=0.1)#总体均值=0.1

One Sample t-test

data: rate
t = 2.9812, df = 149, p-value = 0.003355
alternative hypothesis: true mean is not equal to 0.1
95 percent confidence interval:
0.1033923 0.1167297
sample estimates:
mean of x
0.110061

P值很小，没有理由接受原假设，即该样本不是出自均值为0.1的总体，OK。

1.2 双样本t检验

用于检验某二分类变量区分下的某连续变量的差异是否显著
双样本 t 检验需要满足独立、同方差、正态分布。所以需要先进性方差齐性检验。那么双样本 t 检验步骤：
1、计算两组样本数据均值。
2、方差齐性检验。
3.1、方差齐性：进行方差齐性的双样本 t 检验。
3.2、方差不齐性：进行方差不齐性的双样本 t 检验。
方差齐性检验

> var.test(avg_exp~gender)

F test to compare two variances

data: avg_exp by gender
F = 0.86857, num df = 49, denom df = 19, p-value = 0.6702
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3779117 1.7529818
sample estimates:
ratio of variances
0.868572

p-value = 0.6702,显然，变量 gender 的两个组间与avg_exp 的方差是相同的。

双样本 t 检验

> t.test(avg_exp~gender,var.equal=T)

Two Sample t-test

data: avg_exp by gender
t = -1.7429, df = 68,p-value = 0.08587
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-435.04352 29.39192
sample estimates:
mean in group 0 mean in group 1
925.7052 1128.5310

p-value = 0.08587,显然，变量 gender 在avg_exp 上无显著差异。

2 方差分析

2.1单因素方差分析

用于检验某多分类变量区分下的某连续变量的差异是否显著
单因素方差分析需要满足独立、同方差（组间方差相同）、正态分布。所以需要先进性方差齐性检验。那么单因素方差分析检验步骤：
1、计算两组样本数据均值。
2、方差齐性检验。
3.1、方差齐性：进行方差齐性的单因素方差分析检验。
3.2、方差不齐性：进行方差不齐性的单因素方差分析检验。
方差齐性检验

> bartlett.test(avg_exp~edu_class,data = creditcard_exp)

Bartlett test of homogeneity of variances
data: avg_exp by edu_class
Bartlett’s K-squared = 23.9, df = 3, p-value = 2.62e-05

p-value = 2.62e-05,显然，变量 edu_class的各个组间与avg_exp的方差是不相同的。

> oneway.test(avg_exp~edu_class,var.equal=F)#不齐性var.equal=F

One-way analysis of means (not assuming equal variances)
data: avg_exp and edu_class
F = 61.086, num df = 3.0000, denom df = 7.5956, p-value = 1.141e-05

p-value = 1.141e-05，显然，edu_class的不同类别影响avg_exp。

2.2多因素方差分析

用于检验某多个分类变量区分下的某连续变量的差异是否显著，还需要考虑交互效应

2.2.1无交互效应

> ana<-lm(avg_exp~edu_class+gender)
> summary(ana)

Call:
lm(formula = avg_exp ~ edu_class + gender)

Residuals:
Min 1Q Median 3Q Max
-574.71 -156.71 -48.62 142.11 1039.29

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 207.37 207.09 1.001 0.320381
edu_class1 439.60 216.03 2.035 0.045947 *
edu_class2 786.06 217.83 3.609 0.000599 ***
edu_class3 1241.19 219.56 5.653 3.79e-07 ***
gender1 -57.82 82.84 -0.698 0.487708

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 292.9 on 65 degrees of freedom
Multiple R-squared: 0.5943, Adjusted R-squared: 0.5693
F-statistic: 23.81 on 4 and 65 DF, p-value: 3.75e-12

在R语言中，对于分类变量会自动转换为虚拟变量，并自动将先出现的类别作为参照水平。显然，gender1变量的P = 0.487708，edu_class变量的非参照水平对其参照水平有显著差异，而对gender没显著差异。

2.2.1交互效应

> ana<-lm(avg_exp~edu_class+gender+edu_class*gender)
> summary(ana)

Call:
lm(formula = avg_exp ~ edu_class + gender + edu_class * gender)

Residuals:
Min 1Q Median 3Q Max
-448.26 -138.18 -45.72 107.14 1165.74

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 207.4 200.1 1.037 0.30391
edu_class1 417.8 209.4 1.996 0.05031 .
edu_class2 732.3 213.0 3.438 0.00104 **
edu_class3 1346.6 216.1 6.232 4.27e-08 ***
gender1 -289.7 121.1 -2.391 0.01980 *
edu_class1:gender1 482.4 241.9 1.994 0.05046 .
edu_class2:gender1 386.5 173.3 2.231 0.02926 *
edu_class3:gender1 NA NA NA NA

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 282.9 on 63 degrees of freedom
Multiple R-squared: 0.6331, Adjusted R-squared: 0.5981
F-statistic: 18.12 on 6 and 63 DF, p-value: 4.352e-12

显然，加入交互项后，变量之间的差异显著性发生了部分变化。

3 相关分析（两个连续变量关系检验）

对于相关分析，就比较直接了，通常是用散点图来查看连续变量间的两两相关性。相关系数的选取也相对重要。

> scatterplotMatrix(~avg_exp+Age+Income+dist_home_val+dist_avg_income|gender
+                   ,data=creditcard_exp,main="贷款违约数据散点图矩阵")

4 卡方检验（两个二分类变量关系检验）

通过互联表，列联表，卡方检验，实现对两个二分类变量的关系检验。卡方检验只能检验两分类变量是否有关系，而不能得到关系强弱。

> chisq.test(x=bankruptcy_ind,y=bad_ind)

Pearson’s Chi-squared test

data: bankruptcy_ind and bad_ind
X-squared = 34.012, df = 2, p-value = 4.115e-08

p-value = 4.115e-08，显然，检验的两个二分类变量是有关的。

5 总结

那么什么样的数据选择什么样的假设检验才能得到对的决策？落花生总结得到这样一个表：

通过R快速实现统计推断，以及不同的变量选择什么检验方法，就到这里来，希望对你有帮助，也方便自己下次复习，谢谢~~~

【R】快速实现统计推断

1 t检验

1.1 单样本t检验

1.2 双样本t检验

2 方差分析

2.1单因素方差分析

2.2多因素方差分析

2.2.1无交互效应

2.2.1交互效应

3 相关分析（两个连续变量关系检验）

4 卡方检验（两个二分类变量关系检验）

5 总结

如何使用 JS 判断用户是否处于活跃状态

通过HPA+CronHPA组合应对业务复杂弹性伸缩场景

【MYSQL】存儲過程在批量處理數據表中的應用

【EXCEL】在數據分析中的使用三

【算法2】Logistic迴歸

【Spark】DataFrame

【算法6】K-Means聚類

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結