使用nf-core的ampliseq（qiime2）流程分析16S數據

最近看到生信技能樹的一篇推文在介紹nf-core這個流程管理工具，發現官方有qiime2的流程，學習一下，順便探索一下中間的坑。關於nf-core，這篇推文已經介紹的夠多了，我這裏主要學習它的搭建和使用。

一、環境搭建

首先，先進行環境搭建工作，這是必修課和基礎，沒有環境，什麼也做不了。理解下來，nf-core可以使用三種方式進行環境準備，本地安裝，conda或者docker，一般來說，對新手最友好的當屬conda了，除了有的軟件清華源鏡像裏沒有，會速度極慢，容易失敗，可能環境準備要放許久，如果數據不大的話，建議選用一臺物理地址在香港等地的小云服務器解決，軟件安裝節省很多很多時間。

#下載conda，境內推薦清華源 
#https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
#按照提示安裝
bash Miniconda3-latest-Linux-x86_64.sh
#如果選擇不初始化，激活環境
source miniconda3/bin/activate
#下載流程所需要的環境配置文件
wget https://github.com/nf-core/ampliseq/raw/master/environment.yml
#創建流程所需要的環境
conda env create -n ampliseq --file environment.yml
#激活環境
conda activate ampliseq
#安裝nextflow
conda install -c bioconda nextflow -y

二、配置和運行

配置主要是參考github上這個流程的參數說明，主要是控制16S的擴增引物，電腦的最大CPU核心數和RAM，序列質控trim的長度，先fastqc確定一下。

#配置
cd test
#把數據放在工作目錄，這裏省略
#配置好sample-metadata.txt樣本信息表，下載已經訓練好的分類參考
#版本需要對應，這裏是2019.10
wget https://data.qiime2.org/2019.10/common/gg-13-8-99-nb-classifier.qza
#然後運行流程，這裏我開了一個虛擬機，雙核4g
#因爲已經切到建好的環境了，就不再加上-profile conda參數了,否則又要新建一個一樣的環境
nextflow run nf-core/ampliseq --reads "Dong-16S" \
    --FW_primer TACGGRAGGCAGCAG \
    --RV_primer AGGGTATCTAATCCT \
    --metadata "sample-metadata.txt" \
    --untilQ2import \
    --extension "/*R{1,2}.fastq" \
    --trunclenf 280 \
    --trunclenr 250 \
    --max_memory '3.GB' \
    --max_cpus 2 \
    --onlyDenoising

然後，就得到了輸出結果：

給我的感覺是，一個成熟的流程構建者由於對數據處理有豐富的經驗，可以充分地利用計算機的硬件最大潛能，實現最短的時間完成最大的任務量，這對於生產環境是用及其重要的，科研環境一般可能不會有這種問題，科研最需要的是畫圖，和能說明問題的結論以及故事。它充分地合理安排了各個任務，可以步驟交替運行，但基本上沒有限速步驟，這是值得學習和使用的地方。

Launching `nf-core/ampliseq` [reverent_goldstine] - revision: cd23988d88 [master]
[2m----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/ampliseq v1.1.2
----------------------------------------------------
Pipeline Name     : nf-core/ampliseq
Pipeline Release  : master
Run Name          : reverent_goldstine
Reads             : Dong-16S
Data Type         : Paired-End
Max Resources     : 3.GB memory, 2 cpus, 10d time per job
Output dir        : ./results
Launch dir        : /root/test_project
Working dir       : /root/test_project/work
Script dir        : /root/.nextflow/assets/nf-core/ampliseq
User              : root
Config Profile    : standard
------------------------------------------------------
executor >  local (1)
[-        ] process > get_software_versions        [  0%] 0 of 1
[34/c556b9] process > fastqc                       [  0%] 0 of 6
[-        ] process > trimming                     [  0%] 0 of 6
[-        ] process > multiqc                      -
[-        ] process > qiime_import                 -
[-        ] process > qiime_demux_visualize        -
......
[d8/db2448] process > get_software_versions        [100%] 1 of 1 ✔
[2e/88b642] process > fastqc                       [100%] 6 of 6 ✔
[cf/4605c1] process > trimming                     [100%] 6 of 6 ✔
[57/54f4cc] process > multiqc                      [100%] 1 of 1 ✔
[9e/b298db] process > qiime_import                 [100%] 1 of 1 ✔
[8b/60b53e] process > qiime_demux_visualize        [100%] 1 of 1 ✔
[08/804cd9] process > dada_trunc_parameter         [100%] 1 of 1 ✔
[5c/532077] process > dada_single                  [100%] 1 of 1 ✔
[25/c6cdaa] process > classifier                   [100%] 1 of 1 ✔
[2b/ef2011] process > filter_taxa                  [100%] 1 of 1 ✔
[5f/cf7385] process > export_filtered_dada_output  [100%] 1 of 1 ✔
[29/caccdc] process > report_filter_stats          [100%] 1 of 1 ✔[63/2018d3] process > RelativeAbundanceASV         [100%] 1 of 1 ✔
[35/521394] process > RelativeAbundanceReducedTaxa [100%] 1 of 1 ✔[e3/06d354] process > barplot                      [100%] 1 of 1 ✔
[cb/b1321a] process > tree                         [100%] 1 of 1 ✔[62/9aab63] process > alpha_rarefaction            [100%] 1 of 1 ✔
[0b/caf4ae] process > combinetable                 [100%] 1 of 1 ✔[6d/00777a] process > diversity_core               [100%] 1 of 1 ✔
[13/37d1de] process > metadata_category_all        [100%] 1 of 1 ✔
[81/ac4e94] process > metadata_category_pairwise   [100%] 1 of 1 ✔
[0e/77008d] process > alpha_diversity              [100%] 4 of 4, failed: 4 ✔
[-        ] process > beta_diversity               -
[52/e9e491] process > beta_diversity_ordination    [100%] 4 of 4 ✔
[02/1173eb] process > prepare_ancom                [100%] 1 of 1 ✔
[93/fec20e] process > ancom_tax                    [100%] 5 of 5 ✔
[ca/f279b2] process > ancom_asv                    [100%] 1 of 1 ✔
[ae/e29fd4] process > output_documentation         [100%] 1 of 1 ✔
[0;35mWarning, pipeline completed, but with errored process(es)
[0;31mNumber of ignored errored process(es) : 4
[0;32mNumber of successfully ran process(es) : 43
[0;35m[nf-core/ampliseq] Pipeline completed successfully
[a9/5c27da] NOTE: Process `alpha_diversity (evenness_vector)` terminated with an errorexit status (1) -- Error is ignored
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Completed at: 08-Apr-2020 06:44:42
Duration    : 9m 24s
CPU hours   : 0.2 (2.4% failed)
Succeeded   : 43
Ignored     : 4
Failed      : 4
#運行時間9分鐘左右，已經超級高效了，我手動做的話會到法小時吧。因爲手上數據有些質量問題，處理過程中有報錯

三、結果欣賞

來看看這個結果怎樣，因爲結果做的很漂亮，所以用上了欣賞這個詞。基本上相當於一個公司的數據分析報告的感覺，我覺得如果再加上一個網頁端，人人都可以雲生信做微生物數據分析了。畢竟，16S數據分析也不需要多強大的電腦，自己的筆記本就可以搞定。專注於具體的參數，而不需要考慮每一個命令，這就是未來呀。從運行過程來看，作者還使用了一些R腳本完成了許多圖形的繪製，以及部分文件的操作。

#安裝tree，查看文件目錄樹
sudo apt install tree
tree
#以下是輸出
├── Documentation
├── MultiQC
├── abundance_table
├── alpha-diversity
├── alpha-rarefaction
├── ancom
├── barplot
├── beta-diversity
├── demux
├── fastQC
├── phylogenetic_tree
├── pipeline_info
├── rel_abundance_tables
├── representative_sequences
├── taxonomy
└── trimmed

1.提供了一個幫助文檔，方便理解以上各個文件的信息。

2.然後是結果彙總，是流程的運行概覽信息，CPU，內存使用情況和運行時間，以及各個任務的詳細信息，包括腳本命令等。

3.關於結果，流程是把qiime2的qzv格式做了解壓處理，這樣方便直接用網頁打開而不需要view.qiime2.cn這個網站。而且對文件進行了重命名，方便進行查閱。和qiime2的輸出結果是一樣的，這裏就不放了。

使用nf-core的ampliseq（qiime2）流程分析16S數據

一、環境搭建

二、配置和運行

三、結果欣賞

EXCEL中下拉菜單中添加新選項或者刪除選項

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

Python 爬蟲：Spring Boot 反爬蟲的成功案例

Java中止線程的方式

[轉帖]Oracle Exadata 學習筆記之核心特性Part1

《最新出爐》系列入門篇-Python+Playwright自動化測試-43-分頁測試

HTTP協議相關文檔

MacOS Catalina 運行32位命令行程序的小經驗

2019已走，總結我有

ubiome類似數據dada2處理探索1

ubiome類似數據dada2處理探索2

2019新冠狀病毒學習筆記

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結