筆記 GWAS 操作流程4-3：LM模型+因子協變量

1. 協變量文件整理

第一列爲FID
第二列爲ID
第三列以後爲協變量（注意，只能是數字，不能是字符！）

這裏協變量文件爲：

[dengfei@ny 03_linear_cov]$ head cov.txt 
1061 1061 F 3
1062 1062 M 3
1063 1063 F 3
1064 1064 F 3
1065 1065 F 3
1066 1066 F 3
1067 1067 F 3
1068 1068 M 3
1069 1069 M 3
1070 1070 M 3

這裏第三列爲性別，第四列爲世代，這裏，將世代作爲因子，進行因子協變量的GWAS分析

2. 因子協變量

awk '{print $1,$2,$4}' cov.txt >cov1.txt

數據如下：

1061 1061 3
1062 1062 3
1063 1063 3
1064 1064 3
1065 1065 3
1066 1066 3
1067 1067 3
1068 1068 3
1069 1069 3
1070 1070 3

3. 使用plink的dummy coding轉化爲虛擬變量

plink --file b --covar cov1.txt --write-covar --dummy-coding

結果生成：

plink.cov

注意：
這裏的協變量，會減少一個水平，比如本來世代是由3,4，5三個世代，這裏只有兩個水平。plink文檔是這樣解釋的：

That is, for a variable with K categories, K-1 new dummy variables are created. This new file can be used with --linear and --logistic, and a coefficient for each level would now be estimated for the first covariate (otherwise PLINK would have incorrectly treated the first covariate as an ordinal/ratio measure).

5 進行因子協變量GWAS分析LM模型

代碼：

plink --file b --pheno phe.txt --allow-no-sex --linear --covar plink.cov --out re --hide-covar

日誌：

PLINK v1.90b5.3 64-bit (21 Feb 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to re.log.
Options in effect:
  --allow-no-sex
  --covar plink.cov
  --file b
  --linear
  --out re
  --pheno phe.txt

515199 MB RAM detected; reserving 257599 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (10000 variants, 1500 people).
--file: re-temporary.bed + re-temporary.bim + re-temporary.fam written.
10000 variants loaded from .bim file.
1500 people (0 males, 0 females, 1500 ambiguous) loaded from .fam.
Ambiguous sex IDs written to re.nosex .
1500 phenotype values present after --pheno.
Using 1 thread (no multithreaded calculations invoked).
--covar: 2 covariates loaded.
Before main variant filters, 1500 founders and 0 nonfounders present.
Calculating allele frequencies... done.
10000 variants and 1500 people pass filters and QC.
Phenotype data is quantitative.
Writing linear model association results to re.assoc.linear ... done.

結果文件：
re.assoc.linear

結果預覽：

4. 使用R語言進行結果比較lm+factor

library(data.table)
geno = fread("c.raw")
geno[1:10,1:10]
phe = fread("phe.txt")
cov = fread("cov.txt")
dd = data.frame(phe$V3,cov$V4,geno[,7:20])
head(dd)
str(dd)
mod_M7 = lm(phe.V3 ~ cov.V4 + M7_1,data=dd)
summary(mod_M7)
mod_M9 = lm(phe.V3 ~ cov.V4 + M9_1,data=dd);summary(mod_M9)

M7加上因子協變量結果：

如果是作爲數值協變量的結果爲：

結果是不一樣的。

5. 使用R語言進行結果比較lm+plink.cov

結果和上面世代作爲因子完全一樣。

6. 固定即迴歸

所以，怎麼理解固定即迴歸這句話的？

R語言中，所謂的因子，在進行迴歸分析時，也是將其轉化爲不通過水平的數字變量進行的分析，所以和你手動轉化的虛擬變量結果是一樣的。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

筆記 GWAS 操作流程4-3：LM模型+因子協變量

1. 協變量文件整理

2. 因子協變量

3. 使用plink的dummy coding轉化爲虛擬變量

5 進行因子協變量GWAS分析LM模型

4. 使用R語言進行結果比較lm+factor

5. 使用R語言進行結果比較lm+plink.cov

6. 固定即迴歸

DAPPER 事務 TRANSACTION

RStudio能夠運行python了，改名爲“怕死禿頭工作站？？？”

使用R語言進行聚類分析：熱點圖+橫向聚類圖+縱向聚類圖

jupyter python函數幫助文檔的查看

vcftools 安裝

在Windows10下安裝個虛擬機學習Linux？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結