## 目前已有的框架

A review of bioinformatics pipeline framework 的作者對已有的工具進行很好的分類

作者的看法：

implicit，也就是Make rule語法更適合用於整合不同執行工具
基於配置的流程更加穩定，也比較適合用於集羣分配任務。

最後作者建議是：

如果實驗室既不是純粹的生物學試驗（不需要workbench這種UI界面），也不需要高性能基於類的流程設計，不太好選，主要原則是投入和產出比
如果實驗室進行的是重複性的研究，那麼就需要對數據和軟件進行版本控制，建議是 configuration-based pipelines
如果實驗室做的是探索性的概念證明類工作（exploratory proofs-of-concept)，那麼需要的是 DSL-based pipeline。
如果實驗室用不到高性能計算機(HPC)，只能用雲服務器，就是server-based frameworks.

目前已有的流程可以在awesome-pipeline 進行查找。

就目前來看，pipeline frameworks & library 這部分的框架中 nextflow 是點贊數最多的生物學相關框架。只可惜nextflow在運行時需要創建fifo，而在NTFS文件系統上無法創建，所以我選擇 snakemake , 一個基於Python寫的DSL流程框架。

環境準備

爲了能夠順利完成這部分的教程，請準備一個Linux環境，如果使用Windows，則按照biostarhandbook(一)分析環境和數據可重複部署一個虛擬機，並安裝miniconda3。

如下步驟會下載所需數據，並安裝所需要的軟件，並且啓動工作環境。

wget https://bitbucket.org/snakemake/snakemake-tutorial/get/v3.11.0.tar.bz2
tar -xf v3.11.0.tar.bz2 --strip 1
cd snakemake-snakemake-tutorial-623791d7ec6d
conda env create --name snakemake-tutorial --file environment.yaml
source activate snakemake-tutorial
# 退出當前環境
source deactivate

當前環境下的所有文件

├── data
│   ├── genome.fa
│   ├── genome.fa.amb
│   ├── genome.fa.ann
│   ├── genome.fa.bwt
│   ├── genome.fa.fai
│   ├── genome.fa.pac
│   ├── genome.fa.sa
│   └── samples
│       ├── A.fastq
│       ├── B.fastq
│       └── C.fastq
├── environment.yaml
└── README.md

基礎：一個案例流程

如果你編譯過軟件，那你應該見過和用過make, 但是你估計也沒有仔細想過make是幹嘛用的。Make是最常用的軟件構建工具，誕生於1977年，主要用於C語言的項目，是爲了處理編譯時存在各種依賴關係，尤其是部分文件更新後，Make能夠重新生成需要更新的文件以及其對應的文件。

Snakemake和Make功能一致，只不過用Python實現，增加了許多Python的特性，並且和Python一樣非常容易閱讀。下面將使用Snakemake寫一個變異檢測流程。

第一步：序列比對

Snakemake非常簡單，就是寫各種rule來完成不同的任務。我們第一條rule就是將序列比對到參考基因組上。如果在命令行下就是bwa mem data/genome.fa data/samples/A.fastq | samtools view -Sb - > mapped_reads/A.bam。但是按照Snakemake的規則就是下面的寫法。

# 用你擅長的文本編輯器
vim Snakefile
# 編輯如下內容
rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/A.fastq"
    output:
        "mapped_reads/A.bam"
    shell:
        """
        bwa mem {input} | samtools view -Sb - > {output}
        """

解釋一下：這幾行定義了一個規則(rule)，在這個規則下，輸入(input)有兩個，而輸出(output)只有一個，在shell中運行命令，只不過裏面的文件都用{}形式替代。僞執行一下:snakemake -np mapped_reads/A.bam檢查一下是否會出錯，真實運行情況如下（不帶規則，默認執行第一個規則）:

第二步：推廣序列比對規則

如果僅僅是上面這樣子處理一個文件，還無法體現snakemake的用途，畢竟還不如手動敲代碼來的方便。snakemake的一個有點在於它能夠使用文件名通配的方式對一類文件進行處理。將上面的A改成{sample},就可以將符合*.fastq的文件處理成*.bam.

rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/{sample}.fastq"
    output:
        "mapped_reads/{sample}.bam"
    shell:
        """
        bwa mem {input} | samtools view -Sb - > {output}
        """

那麼，用snakemake -np mapped_reads/{A,B,C}.bam，就會發現，他非常機智的就比對了B.fastq和C.fastq，而不會再比對一遍A.fastq, 也不需要你寫一堆的判斷語句去手動處理。

當然，如果你用touch data/samples/A.fastq改變A.fastq的時間戳，他就會認位A.fastq文件發生了改變，那麼重複之前的命令就會比對A.fastq。

第三步：比對後排序

比對後的文件還需要進一步的排序，才能用於後續分析，那麼規則該如何寫呢？

rule samtools_sort:
    input:
        "mapped_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam"
    shell:
        "samtools sort -T sorted_reads/{wildcards.sample}"
        " -O bam {input} > {output}"

以之前的輸出作爲輸出文件名，輸出到另一個文件夾中。和之前的規則基本相同，只不過這裏用到了wildcards.sample來獲取通配名用作-T的臨時文件的前綴sample實際名字。

運行snakemake -np sorted_reads/B.bam，你就會發現他就會非常智能的先比對再排序。這是因爲snakemake會自動解決依賴關係，並且按照依賴的前後順序進行執行。

第四步：建立索引和對任務可視化

這裏我們再寫一個規則，對之前的排序後的BAM文件建立索引。

rule samtools_index:
    input:
        "sorted_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam.bai"
    shell:
        "samtools index {input}"

目前已經寫了三個規則，那麼這些規則的執行和依賴關係如何呢？ snakemake提供了--dag選項用於dot命令進行可視化

snakemake --dag sorted_reads/{A,B}.bam.bai | dot -Tsvg > dag.svg

第五步：基因組變異識別

基因組變異識別需要整合之前所有的BAM文件，你可能會打算這樣寫

rule bcftools_call:
    input:
        fa="data/genome.fa",
        bamA="sorted_reads/A.bam"
        bamB="sorted_reads/B.bam"
        baiA="sorted_reads/A.bam.bai"
        baiB="sorted_reads/B.bam.bai"
    output:
        "calls/all.vcf"
    shell:
        "samtools mpileup -g -f {input.fa} {input.bamA} {input.bamB} | "
        "bcftools call -mv - > {output}"

這樣寫的卻沒有問題，但是以後每多一個樣本就需要多寫一個輸入，太麻煩了。這裏就體現出Snakemake和Python所帶來的特性了，我們可以用列表推導式的方法搞定。

["sorted_reads/{}.bam".format(sample) for sample in ["A","B"]]

進一步，可以在規則外定義SAMPLES=["A","B"]，則規則內的輸入可以寫成bam=["sorted_reads/{}.bam".format(sample) for sample in SAMPLES]. 由於列表推導式比較常用，但是寫起來有點麻煩，snakemake定義了expand進行簡化, 上面可以繼續改寫成expand("sorted_reads/{sample}.bam", sample=SAMPLES)

那麼最後的規則就是

SAMPLES=["A","B"]
rule bcftools_call:
    input:
        fa="data/genome.fa",
        bam=expand("sorted_reads/{sample}.bam", sample=SAMPLES),
        bai=expand("sorted_reads/{sample}.bam.bai", sample=SAMPLES)
    output:
        "calls/all.vcf"
    shell:
        "samtools mpileup -g -f {input.fa} {input.bam} | "
        "bcftools call -mv - > {output}"

小練習：請用snakemake生成當前的DAG圖。

第六步：編寫報告

上面都是在規則裏執行shell腳本，snakemake的一個優點就是可以在規則裏面寫Python腳本，只需要把shell改成run，此外還不需要用到引號。

rule report:
    input:
        "calls/all.vcf"
    output:
        "report.html"
    run:
        from snakemake.utils import report
        with open(input[0]) as vcf:
            n_calls = sum(1 for l in vcf if not l.startswith("#"))

        report("""
        An example variant calling workflow
        ===================================

        Reads were mapped to the Yeast
        reference genome and variants were called jointly with
        SAMtools/BCFtools.

        This resulted in {n_calls} variants (see Table T1_).
        """, output[0], T1=input[0])

這裏還用到了snakemake的一個函數，report，可以對markdown語法進行渲染生成網頁。

第七步：增加目標規則

之前運行snakemake都是用的snakemake 目標文件名, 除了目標文件名外，snakemake還支持規則名作爲目標。通常我們按照習慣定義一個all規則，來生成結果文件。

rule all:
    input:
        "report.html

基礎部分小結：

總結下學習過程，知識點如下：

Snakemake基於規則執行命令，規則一般由input, output,shell三部分組成。
Snakemake可以自動確定不同規則的輸入輸出的依賴關係，根據時間戳來判斷文件是否需要重新生成
Snakemake以{sample}.fa形式進行文件名通配，用{wildcards.sample}獲取sample的實際文件名
Snakemake用expand()生成多個文件名，本質是Python的列表推導式
Snakemake可以在規則外直接寫Python代碼，在規則內的run裏也可以寫Python代碼。
Snakefile的第一個規則通常是rule all，因爲默snakemake默認執行第一條規則

進階：對流程進一步修飾

在基礎部分中，我們完成了流程的框架，下一步則是對這個框架進行不斷完善，比如說編寫配置文件，聲明不同rule的消耗資源，記錄運行日誌等。

第一步：聲明所需進程數

對於一些工具，比如說bwa，多進程或者多線程運行能夠大大加速計算。snakemake使用threads來定義當前規則所用的進程數，我們可以對之前的bwa_map增加該指令。

rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/{sample}.fastq"
    output:
        "mapped_reads/{sample}.bam"
    threads:8
    shell:
        "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"

聲明threads後，Snakemake任務調度器就會在程序運行的時候是否並行多個任務。這主要和參數中的--cores相關。比如說

snakemake --cores 10

由於總體上就分配了10個核心，於是一次就只能運行一個需要消耗8個核心的bwa_map。但是當其中一個bwa_map運行完畢，這個時候snakemaek就會同時運行一個消耗8個核心的bwa_map和沒有設置核心數的samtools_sort,來保證效率最大化。因此對於需要多線程或多進程運行的程序而言，將所需的進程單獨編碼，而不是硬編碼到shell命令中，能夠更有效的使用資源。

第二步：配置文件

之前的SAMPLES寫在了snakefile，也就是意味這對於不同的項目，需要對snakefile進行修改，更好的方式是用一個配置文件。配置文件可以用JSON或YAML語法進行寫，然後用configfile: "config.yaml"讀取成字典，變量名爲config。

config.yaml內容爲:

samples:
    A: data/samples/A.fastq
    B: data/samples/B.fastq

YAML使用縮進表示層級關係，其中縮進必須用空格，但是空格數目不重要，重要的是所今後左側對齊。上面的YAML被Pytho讀取之後，以字典保存，形式爲{'samples': {'A': 'data/samples/A.fastq', 'B': 'data/samples/B.fastq'}}

而snakefile也可以改寫成

configfile: "config.yaml"
...
rule bcftools_call:
    input:
        fa="data/genome.fa",
        bam=expand("sorted_reads/{sample}.bam", sample=config["samples"]),
        bai=expand("sorted_reads/{sample}.bam.bai", sample=config["smaples])
    output:
        "calls/all.vcf"
    shell:
        "samtools mpileup -g -f {input.fa} {input.bam} | "
        "bcftools call -mv - > {output}"

雖然sample是一個字典，但是展開的時候，只會使用他們的key值部分。

關於YAML格式的教程，見阮一峯的博客：http://www.ruanyifeng.com/blog/2016/07/yaml.html

第三步：輸入函數

既然已經把文件路徑都存入到配置文件中，那麼可以進一步的改寫之前的bwa_map裏的輸入部分。也就是從字典裏面提取到存放的路徑。最開始我就是打算這樣寫

rule bwa_map:
    input:
        "data/genome.fa",
        config['samples']["{sample}"]
    output:
        "mapped_reads/{sample}.bam"
    threads:8
    shell:
        "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"

畢竟"{sample}"從理論上應該得到sample的名字。但是snakemake -np顯示出現錯誤

KeyError in line 11 of /home6/zgxu/snakemake-snakemake-tutorial-623791d7ec6d/Snakefile:
'{sample}'

這可能是{sample}的形式只能在匹配的時候使用，而在獲取值的時候應該用基礎第三步的wildcards.sample形式。於是繼續改成config["samples"][wildcards.sample]。然而還是出現了錯誤。

name 'wildcards' is not defined

爲了理解錯誤的原因，並找到解決方法，我們需要理解Snakemake工作流程執行的一些原理，它執行分爲三個階段

在初始化階段，工作流程會被解析，所有規則都會被實例化
在DAG階段，也就是生成有向無環圖，確定依賴關係的時候，所有的通配名部分都會被真正的文件名代替。
在調度階段，DAG的任務按照順序執行

也就是說在初始化階段，我們是無法獲知通配符所指代的具體文件名，必須要等到第二階段，纔會有wildcards變量出現。也就是說之前的出錯的原因都是因爲第一個階段沒通過。這個時候就需要輸入函數推遲文件名的確定，可以用Python的匿名函數，也可以是普通的函數

rule bwa_map:
    input:
        "data/genome.fa",
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        "mapped_reads/{sample}.bam"
    threads: 8
    shell:
        "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"

第四步：規則參數

有些時候，shell命令不僅僅是由input和output中的文件組成，還需要一些靜態的參數設置。如果把這些參數放在input裏，則會因爲找不到文件而出錯，所以需要專門的params用來設置這些參數。

rule bwa_map:
    input:
        "data/genome.fa",
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        "mapped_reads/{sample}.bam"
    threads: 8
    params:
        rg="@RG\tID:{sample}\tSM:{sample}"
    shell:
        "bwa mem -R '{params.rg}' '-t {threads} {input} | samtools view -Sb - > {output}"

寫在rule中的params的參數，可以在shell命令中或者是run裏面的代碼進行調用。

第五步：日誌文件

當工作流程特別的大，每一步的輸出日誌都建議保存下來，而不是輸出到屏幕，這樣子出錯的時候才能找到出錯的所在。snakemake非常貼心的定義了log,用於記錄日誌。好處就在於出錯的時候，在log裏面定義的文件是不會被snakemake刪掉，而output裏面的文件則是會被刪除。繼續修改之前的bwa_map.

rule bwa_map:
    input:
        "data/genome.fa",
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        "mapped_reads/{sample}.bam"
    params:
        rg="@RG\tID:{sample}\tSM:{sample}"
    log:
        "logs/bwa_mem/{sample}.log"
    threads: 8
    shell:
        "(bwa mem -R '{params.rg}' -t {threads} {input} | "
        "samtools view -Sb - > {output}) 2> {log}"

這裏將標準錯誤重定向到了log中。

第六步：臨時文件和受保護的文件

由於高通量測序的數據量通常很大，因此很多無用的中間文件會佔據大量的磁盤空間。而特異在執行結束後寫一個shell命令清除不但寫起來麻煩，而且也不好管理。Snakemake使用temp()來將一些文件標記成臨時文件，在執行結束後自動刪除。

rule bwa_map:
    input:
        "data/genome.fa",
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        temp("mapped_reads/{sample}.bam")
    params:
        rg="@RG\tID:{sample}\tSM:{sample}"
    log:
        "logs/bwa_mem/{sample}.log"
    threads: 8
    shell:
        "(bwa mem -R '{params.rg}' -t {threads} {input} | "
        "samtools view -Sb - > {output}) 2> {log}"

修改之後的代碼，當samtools_sort運行結束後就會把"mapped_reads"下的BAM刪掉。同時由於比對和排序都比較耗時，得到的結果要是不小心被誤刪就會浪費大量計算時間，最後的方法就是用protected()保護起來

rule samtools_sort:
    input:
        "mapped_reads/{sample}.bam"
    output:
        protected("sorted_reads/{sample}.bam")
    shell:
        "samtools sort -T sorted_reads/{wildcards.sample} "
        "-O bam {input} > {output}"

最後，snakemake就會在文件系統中對該輸出文件寫保護，也就是最後的權限爲-r--r--r--, 在刪除的時候會問你rm: remove write-protected regular file ‘A.bam’?.

進階部分小結

使用threads:定義不同規則所需線程數，有利於snakemake全局分配任務，最優化任務並行
使用configfile:讀取配置文件，將配置和流程分離
snakemake在DAG階段纔會知道通配的具體文件名，因此在input和output出現的wildcards就需要推遲到第二步。
在log裏定義的日誌文件，不會因任務失敗而被刪除
在params定義的參數，可以在shell和run中直接調用
temp()中的文件運行結束後會被刪除，而protected()中的文件會有寫保護，避免意外刪除。

高級：實現流程的自動部署

上面的分析流程都是基於當前環境下已經安裝好要調用的軟件，如果你希望在新的環境中也能快速部署你的分析流程，那麼你需要用到snakmake更高級的特性，也就是爲每個rule定義專門的運行環境。

全局環境

我建議你在新建一個snakemake項目時，都先用conda create -n 項目名 python=版本號創建一個全局環境，用於安裝一些常用的軟件，例如bwa、samtools、seqkit等。然後用如下命令將環境導出成yaml文件

conda env export -n 項目名 -f environment.yaml

那麼當你到了一個新的環境，你就可以用下面這個命令重建出你的運行環境

conda env create -f environment.yaml

局部環境

當然僅僅依賴於全局環境或許還不夠，對於不同的規則(rule)可能還有Python2和Python3的區別，所以你還得爲每個規則創建環境。

snakemake有一個參數--use-conda,會解析rule中的conda規則，根據其提供的yaml文件安裝特定版本的工具，以基礎第一步的序列比對爲例，

rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/A.fastq"
    output:
        "mapped_reads/A.bam"
    conda:
        "envs/map.yaml"
    shell:
        """
        mkdir -p mapped_reads
        bwa mem {input} | samtools view -Sb - > {output}
        """

隨後在snakemake執行的目錄下創建envs文件夾，增加map.yaml, 內容如下

name: map
channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  - defaults
dependencies:
  - bwa=0.7.17
  - samtools=1.9
show_channel_urls: true

注意: YAML文件的name行不是必要的，但是建議加上。

那麼當你用snakmake --use-conda執行時，他就會在.snakemake/conda下創建專門的conda環境用於處理當前規則。對於當前項目，該conda環境創建之後就會一直用於該規則，除非yaml文件發生改變。

如果你希望在實際運行項目之前先創建好環境，那麼可以使用--create-envs-only參數。

由於默認情況下，每個項目運行時只會在當前的.snakemake/conda查找環境或者安裝環境，所以在其他目錄執行項目時，snakemake又會重新創建conda環境，如果你擔心太佔地方或者環境太大，安裝的時候太廢時間，你可以用--conda-prefix指定專門的文件夾。

代碼總結

最後的代碼如下

configfile: "config.yaml"


rule all:
    input:
        "report.html"


rule bwa_map:
    input:
        "data/genome.fa",
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        temp("mapped_reads/{sample}.bam")
    params:
        rg="@RG\tID:{sample}\tSM:{sample}"
    log:
        "logs/bwa_mem/{sample}.log"
    threads: 8
    shell:
        "(bwa mem -R '{params.rg}' -t {threads} {input} | "
        "samtools view -Sb - > {output}) 2> {log}"


rule samtools_sort:
    input:
        "mapped_reads/{sample}.bam"
    output:
        protected("sorted_reads/{sample}.bam")
    shell:
        "samtools sort -T sorted_reads/{wildcards.sample} "
        "-O bam {input} > {output}"


rule samtools_index:
    input:
        "sorted_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam.bai"
    shell:
        "samtools index {input}"


rule bcftools_call:
    input:
        fa="data/genome.fa",
        bam=expand("sorted_reads/{sample}.bam", sample=config["samples"]),
        bai=expand("sorted_reads/{sample}.bam.bai", sample=config["samples"])
    output:
        "calls/all.vcf"
    shell:
        "samtools mpileup -g -f {input.fa} {input.bam} | "
        "bcftools call -mv - > {output}"


rule report:
    input:
        "calls/all.vcf"
    output:
        "report.html"
    run:
        from snakemake.utils import report
        with open(input[0]) as vcf:
            n_calls = sum(1 for l in vcf if not l.startswith("#"))

        report("""
        An example variant calling workflow
        ===================================

        Reads were mapped to the Yeast
        reference genome and variants were called jointly with
        SAMtools/BCFtools.

        This resulted in {n_calls} variants (see Table T1_).
        """, output[0], T1=input[0])

執行snakemake

寫完Snakefile之後就需要用snakemake執行。snakemake的選項非常多，這裏列出一些比較常用的運行方式。

運行前檢查潛在錯誤：

snakemake -n
snakemake -np
snakemake -nr
# --dryrun/-n: 不真正執行
# --printshellcmds/-p: 輸出要執行的shell命令
# --reason/-r: 輸出每條rule執行的原因

直接運行:

snakemake
snakemake -s Snakefile -j 4
# -s/--snakefile 指定Snakefile，否則是當前目錄下的Snakefile
# --cores/--jobs/-j N: 指定並行數，如果不指定N，則使用當前最大可用的核心數

強制重新運行：

snakemake -f
# --forece/-f: 強制執行選定的目標，或是第一個規則，無論是否已經完成
snakemake -F
# --forceall/-F: 也是強制執行，同時該規則所依賴的規則都要重新執行
snakemake -R some_rule
# --forecerun/-R TARGET: 重新執行給定的規則或生成文件。當你修改規則的時候，使用該命令

可視化：

snakemake --dag  | dot -Tsvg > dag.svg
snakemake --dag  | dit -Tpdf > dag.pdf
# --dag: 生成依賴的有向圖
snakemake --gui 0.0.0.0:2468
# --gui: 通過網頁查看運行狀態

集羣執行：

snakemake --cluster "qsub -V -cwd -q 投遞隊列" -j 10
# --cluster /-c CMD: 集羣運行指令
## qusb -V -cwd -q， 表示輸出當前環境變量(-V),在當前目錄下運行(-cwd), 投遞到指定的隊列(-q), 如果不指定則使用任何可用隊列
# --local-cores N: 在每個集羣中最多並行N核
# --cluster-config/-u FILE: 集羣配置文件

使用Snakemake搭建分析流程

## 目前已有的框架

環境準備

基礎：一個案例流程

第一步：序列比對

第二步：推廣序列比對規則

第三步：比對後排序

第四步：建立索引和對任務可視化

第五步：基因組變異識別

第六步：編寫報告

第七步：增加目標規則

基礎部分小結：

進階：對流程進一步修飾

第一步：聲明所需進程數

第二步：配置文件

第三步：輸入函數

第四步：規則參數

第五步：日誌文件

第六步：臨時文件和受保護的文件

進階部分小結

高級：實現流程的自動部署

全局環境

局部環境

代碼總結

執行snakemake

參考資料

Wireshark 安裝+使用（一）

博客園商業化之路-衆包平臺：繼續召集早期合作開發者

使用Snakemake搭建分析流程

如何用Python給自己做一個年終總結

「Bionano系列」下機數據的BNX文件到底說了什麼

「Bionano系列」下機原始數據過濾和評估

三代轉錄組系列：使用Cogent重建基因組編碼區

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

使用Snakemake搭建分析流程

## 目前已有的框架

環境準備

基礎：一個案例流程

第一步：序列比對

第二步：推廣序列比對規則

第三步：比對後排序

第四步： 建立索引和對任務可視化

第五步：基因組變異識別

第六步：編寫報告

第七步：增加目標規則

基礎部分小結：

進階：對流程進一步修飾

第一步： 聲明所需進程數

第二步：配置文件

第三步：輸入函數

第四步：規則參數

第五步： 日誌文件

第六步：臨時文件和受保護的文件

進階部分小結

高級：實現流程的自動部署

全局環境

局部環境

代碼總結

執行snakemake

參考資料

第四步：建立索引和對任務可視化

第一步：聲明所需進程數

第五步：日誌文件