snakemake 學習筆記4

snakemake如何連接不同的rule

我在stackoverflow中問了一個問題, 獲得了答案, 對snakemake的理解也加深了一成.

經驗所得

  • 每一個snakemake的rule都要有input,output, 裏面的內容交叉的地方, 是確定不同rule的依賴, 比如rule1的輸出文件(output)b.bed, b.bim, b.fam, 如果作爲rule2的輸入文件(input), 那麼rule1和rule2就可以關聯了.
  • rule all是定義最後的輸出文件, 比如rule2的最後輸出文件是c.raw, 那麼也寫爲c.raw即可.

測試文件

這裏, 有兩個plink的文件,a.mapa.ped, 內容如下:

(base) [dengfei@localhost plink-test]$ cat a.map 
1 snp1 0 1
1 snp2 0 2
1 snp3 0 3
(base) [dengfei@localhost plink-test]$ cat a.ped 
1 1 0 0 1  0  1 1  2 2  1 1
1 2 0 0 2  0  2 2  0 0  2 1
1 3 1 2 1  2  0 0  1 2  2 1
2 1 0 0 1  0  1 1  2 2  0 0
2 2 0 0 2  2  2 2  2 2  0 0
2 3 1 2 1  2  1 1  2 2  1 1

1. 將plink文件變爲二進制bfile格式

正常plink方式:

 plink --file a --out b

結果:

(base) [dengfei@localhost plink-test]$ plink --file a --out b
PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to b.log.
Options in effect:
  --file a
  --out b

63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
--file: b.bed + b.bim + b.fam written.

2. 將bfile變爲raw格式

plink --bfile b --out c --recodeA

結果:

(base) [dengfei@localhost plink-test]$ plink --bfile b --out c --recodeA
PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Note: --recodeA flag deprecated.  Use 'recode A ...'.
Logging to c.log.
Options in effect:
  --bfile b
  --out c
  --recode A

63985 MB RAM detected; reserving 31992 MB for main workspace.
3 variants loaded from .bim file.
6 people (4 males, 2 females) loaded from .fam.
3 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 4 founders and 2 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.777778.
3 variants and 6 people pass filters and QC.
Among remaining phenotypes, 3 are cases and 0 are controls.  (3 phenotypes are
missing.)
--recode A to c.raw ... done.

3. 使用snakemake進行連接

命名爲: plink.smk

rule all:
    input:
        "c.log","c.raw"

rule bfile:
    input:
        "a.map","a.ped"
    output:
        "b.bed","b.bim","b.fam"
    params:
        a1 = "a",
        a2 = "b"
    shell:
        "plink --file {params.a1} --out {params.a2}"

rule cfile:
    input:
        "b.bed","b.bim","b.fam"
    output:
        "c.log", "c.raw"
    params:
        aa1 = "b",
        aa2 = "c"
    shell:
        "plink --bfile {params.aa1} --out {params.aa2} --recodeA"

命令解析:

  • 1, rule all定義最終的輸出文件, 這裏fule cfile輸出的是c.logc.raw, 因此rule all中的input也寫爲c.logc.raw
  • 2, rule bfile, 這裏的input是a.mapa.ped, output是b.bed,b.bim,b.fam, 這三個文件也要寫, 因爲是下一個rule的input文件, 建立依賴關係.
  • 3, rule cfile中建立input, 是上一個rule bfile的輸出, 這樣就建立的依賴
  • 4, rule cfile中的output, 對應的是rule all的input, 這樣三個就建立好了依賴關係.

4. 查看流程圖

運行命令:

snakemake -s plink.smk 

查看流程圖:

snakemake --dag -s plink.smk |dot -Tpdf >a.pdf

image
歡迎關注我的公衆號
在這裏插入圖片描述

相關閱讀:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章