ichorCNA HMMcopy (for readCounter)

ichorCNA 軟件是麻省理工學院 Broad 研究所等單位於2017年發佈的一款軟件。ichorCNA 主要用於檢測極低深度(∼0.01×)測序的 cfDNA 樣本中腫瘤分數(tumor fraction,TFx)和拷貝數變異(CNV),也可應用於腫瘤組織。

經典案例分析
Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors

期刊:Nature Communications
影響因子:11.878
發表時間:2017年11月
作者首先使用520名轉移性乳腺癌或前列腺癌患者,共1439個血液樣本,使用 ichorCNA 軟件發現在33%到49%的轉移性癌症患者中,腫瘤 DNA 的含量超過了 cfDNA 的10%。

參考信息:https://github.com/broadinstitute/ichorCNA/wiki

Installation

Option 1 (Recommended)

Using R devtools

install.packages("devtools")
library(devtools)
install_github("broadinstitute/ichorCNA")

Option 2

Manual installation

  1. Checkout the latest release of ichorCNA from GitHub

     git clone [email protected]:broadinstitute/ichorCNA.git  
    
  2. Install R dependencies (in R-3.6.0 or later)

     ## install from CRAN
     install.packages("plyr") 
     ## install packages from
     source("https://bioconductor.org/biocLite.R")
     BiocManager::install("HMMcopy")  
     BiocManager::install("GenomeInfoDb")  
     BiocManager::install("GenomicRanges")  
    
  3. Install the ichorCNA R package

     ## from the command line and in the directory where ichorCNA github was cloned.
     R CMD INSTALL ichorCNA   
    

HMMcopy (for readCounter)

Install the HMMcopy suite from http://shahlab.ca/projects/hmmcopy_utils/ or https://github.com/shahcompbio/hmmcopy_utils . Please follow instructions on the HMMcopy website.

用的4G熱點,用第一個方法完成了安裝:

─  building ‘ichorCNA_0.3.2.tar.gz’
   
* installing *source* package ‘ichorCNA’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (ichorCNA)

Usage

There are 2 main steps in the analysis workflow:

  1. Generating read count coverage information using readCounter from the HMMcopy suite.
  2. Copy number analysis and prediction of tumor fraction using ichorCNA R package.

The analysis workflow has also been written into a Snakemake Workflow

Generate Read Count File

To create a WIG file from a ULP-WGS BAM, use HMMcopy's readCounter. This example would create a WIG file with 1Mb bins across all chromosomes and only include reads with a mapping quality greater than 20.

Note: The BAM file must be indexed and the index file must have the extension .bam.bai. The index file should be located in the same directory as the BAM file.

/path/to/HMMcopy/bin/readCounter --window 1000000 --quality 20 \
--chromosome "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y" \
/path/to/tumor.bam > /path/to/tumor.wig

Run ichorCNA

The easiest way to manually run ichorCNA is to use runIchorCNA.R provided in the ichorCNA/scripts/ directory. Here is an example of how to launch the R script from the command line:

Rscript /path/to/ichorCNA/scripts/runIchorCNA.R --id tumor_sample \
  --WIG /path/to/tumor.wig --ploidy "c(2,3)" --normal "c(0.5,0.6,0.7,0.8,0.9)" --maxCN 5 \
  --gcWig /path/to/ichorCNA/inst/extdata/gc_hg19_1000kb.wig \
  --mapWig /path/to/ichorCNA/inst/extdata/map_hg19_1000kb.wig \
  --centromere /path/to/ichorCNA/inst/extdata/GRCh37.p13_centromere_UCSC-gapTable.txt \
  --normalPanel /path/to/ichorCNA/inst/extdata/HD_ULP_PoN_1Mb_median_normAutosome_mapScoreFiltered_median.rds \
  --includeHOMD False --chrs "c(1:22, \"X\")" --chrTrain "c(1:22)" \
  --estimateNormal True --estimatePloidy True --estimateScPrevalence True \
  --scStates "c(1,3)" --txnE 0.9999 --txnStrength 10000 --outDir ./

Invoking the --help flag will print out the list of options to use. Here, we will briefly describe some of the key arguments to consider.

Rscript runIchorCNA.R --help
Usage: runIchorCNA.R [options]

Options:
--WIG=WIG
    Path to tumor WIG file. Required.

--NORMWIG=NORMWIG
    Path to normal WIG file. Default: [NULL]

--gcWig=GCWIG
    Path to GC-content WIG file; Required

--mapWig=MAPWIG
    Path to mappability score WIG file. Default: [NULL]

--normalPanel=NORMALPANEL
    Median corrected depth from panel of normals. Default: [NULL]

--exons.bed=EXONS.BED
    Path to bed file containing exon regions. Default: [NULL]

--id=ID
    Patient ID. Default: [test]

--centromere=CENTROMERE
    File containing Centromere locations; if not provided then will use hg19 version from ichorCNA package. Default: [NULL]

--rmCentromereFlankLength=RMCENTROMEREFLANKLENGTH
    Length of region flanking centromere to remove. Default: [1e+05]

--normal=NORMAL
    Initial normal contamination; can be more than one value if additional normal initializations are desired. Default: [0.5]

--scStates=SCSTATES
    Subclonal states to consider. Default: [NULL]

--coverage=COVERAGE
    PICARD sequencing coverage. Default: [NULL]

--lambda=LAMBDA
    Initial Student's t precision; must contain 4 values (e.g. c(1500,1500,1500,1500)); if not provided then will automatically use based on variance of data. Default: [NULL]

--lambdaScaleHyperParam=LAMBDASCALEHYPERPARAM
    Hyperparameter (scale) for Gamma prior on Student's-t precision. Default: [3]

--ploidy=PLOIDY
    Initial tumour ploidy; can be more than one value if additional ploidy initializations are desired. Default: [2]

--maxCN=MAXCN
    Total clonal CN states. Default: [7]

--estimateNormal=ESTIMATENORMAL
    Estimate normal. Default: [TRUE]

--estimateScPrevalence=ESTIMATESCPREVALENCE
    Estimate subclonal prevalence. Default: [TRUE]

--estimatePloidy=ESTIMATEPLOIDY
    Estimate tumour ploidy. Default: [TRUE]

--maxFracCNASubclone=MAXFRACCNASUBCLONE
    Exclude solutions with fraction of subclonal events greater than this value. Default: [0.7]

--maxFracGenomeSubclone=MAXFRACGENOMESUBCLONE
    Exclude solutions with subclonal genome fraction greater than this value. Default: [0.5]

--minSegmentBins=MINSEGMENTBINS
    Minimum number of bins for largest segment threshold required to estimate tumor fraction; if below this threshold, then will be assigned zero tumor fraction.

--altFracThreshold=ALTFRACTHRESHOLD
    Minimum proportion of bins altered required to estimate tumor fraction; if below this threshold, then will be assigned zero tumor fraction. Default: [0.05]

--chrNormalize=CHRNORMALIZE
    Specify chromosomes to normalize GC/mappability biases. Default: [c(1:22)]

--chrTrain=CHRTRAIN
    Specify chromosomes to estimate params. Default: [c(1:22)]

--chrs=CHRS
    Specify chromosomes to analyze. Default: [c(1:22,"X")]

--normalizeMaleX=NORMALIZEMALEX
    If male, then normalize chrX by median. Default: [TRUE]

--fracReadsInChrYForMale=FRACREADSINCHRYFORMALE
    Threshold for fraction of reads in chrY to assign as male. Default: [0.001]

--includeHOMD=INCLUDEHOMD
    If FALSE, then exclude HOMD state. Useful when using large bins (e.g. 1Mb). Default: [FALSE]

--txnE=TXNE
    Self-transition probability. Increase to decrease number of segments. Default: [0.9999999]

--txnStrength=TXNSTRENGTH
    Transition pseudo-counts. Exponent should be the same as the number of decimal places of --txnE. Default: [1e+07]

--plotFileType=PLOTFILETYPE
    File format for output plots. Default: [pdf]

--plotYLim=PLOTYLIM
    ylim to use for chromosome plots. Default: [c(-2,2)]

--outDir=OUTDIR
    Output Directory. Default: [./]

--libdir=LIBDIR
    Script library path. Usually exclude this argument unless custom modifications have been made to the ichorCNA R package code and the user would like to source those R files. Default: [NULL]

Create Panel of Normals

ichorCNA can be run without any reference samples and a panel of normals is not necessary for analysis with ichorCNA. However, if you choose, you can use a normal reference or a panel of normals.

Panel of Normals Purpose

We provide a panel of normals (PoN) with ichorCNA but generating your own using samples that were processed and sequenced similarly to your cancer patient cfDNA samples may reduce noise and improve accuracy. These data help to further normalize the cancer patient cfDNA to correct for systematic biases arising from library construction, sequencing platform, and cfDNA-specific artifacts. We also provide an R script to generate a PoN with your own cfDNA lowpass samples.

Generating your own PoN

Create WIG Files

Create a WIG file for each sample in your PoN just as you would for any cfDNA sample you would analyze with ichorCNA.

/path/to/HMMcopy/bin/readCounter --window 1000000 --quality 20 \
    --chromosome "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y" \
    /path/to/tumor.bam > /path/to/tumor.wig

Generate PoN

Use the createPanelOfNormals.R script provided in the scripts directory to generate your PoN. As input, this script takes a file that has the path to each WIG file you'd like to use in your panel (one per line, no header).

Rscript createPanelOfNormals.R 
    --filelist /path/to/wig_files.txt \
    --gcWig /path/to/gc.wig --mapWig /path/to/map.wig \
    --centromere /path/to/centromeres_file.txt \
    --outfile base_outfile_name

When you run ichorCNA, you can pass the .rds output file in using the --normalPanel option to normalize your sample using the PoN.

Parameter tuning and settings

Low tumor content samples (early stage disease)

For samples that are expected to have lower than 5% tumor fraction, it may be helpful to modify the default settings to improve parameter estimation. It is recommended to sequence to higher coverages (> 1-5x) for these types of samples. For samples with less than ~0.5% expected tumor fraction, we recommend standard depths of whole genome sequencing (e.g. > 20x).

  • Initialize tumor fraction parameter
    --normal "c(0.95, 0.99, 0.995, 0.999)"
    Initialize the non-tumor (1 minus tumor fraction) to expected values, such as 5%, 1%, 0.5%, 0.1%. ichorCNA will still estimate the tumor fraction but having these initial starting values can help the EM step find better global optima.

  • Set initial ploidy to diploid
    --ploidy "c(2)"
    It will be difficult to predict the ploidy value for low tumor fraction cases.

  • Reduce number of copy number states
    --maxCN 3
    Reducing the state space will help reduce complexity. If you know from a prior sample (e.g. tumor biopsy) that there are large high level copy number events, you can set this to 4.

  • Do not account for subclonal copy number events
    --estimateScPrevalence FALSE --scStates "c()"
    Subclonal events are difficult to detect for low tumor fraction, these can we turned off.

  • Train and analyze autosomes only
    --chrs "c(1:22)" --chrTrain "c(1:22)"
    Exclude chrX in the analysis and training to reduce complexity.

具體流程及注意事項參考:

https://github.com/broadinstitute/ichorCNA/tree/master/scripts/snakemake
https://github.com/broadinstitute/ichorCNA/wiki/SnakeMake-pipeline-for-ichorCNA

Interpreting ichorCNA results

Primary applications of ichorCNA

Cell-free DNA samples may have low amounts tumor-derived DNA that may be difficult to detect. ichorCNA has been optimized to detect and quantify the amount of tumor DNA in plasma to answer three primary questions:

  1. Is tumor-derived DNA present in the cfDNA sample?

  2. How do we decide if the cfDNA sample has sufficient fraction of tumor for follow-up whole exome or deeper genome sequencing?

  3. How do we calibrate the depth of follow-up whole exome or genome sequencing?

Additionally, ichorCNA detects large-scale copy number alterations (CNAs), which can be used to characterize the genomic landscape in large cohorts.

  1. What are the recurrent CNAs in a cohort?

  2. How does the genomic CNA profile of a patient change over time between longitudinal plasma samples?

Tumor Fraction Estimates

Tumor fraction (TFx) is defined as the global (genome-wide) proportion of the cell-free DNA sample that is tumor-derived; (1-TFx) is the proportion of non-tumor-derived DNA. TFx is similar to commonly used terms used for bulk tumor analyses - tumor content, tumor purity, etc. The tumor fraction estimates are shown in <sampleID>.params.txt, along with the tumor ploidy, subclonal fraction, and metrics for all solutions considered.
This value is the most important parameter used to address the first 3 questions above.

Determine the presence of tumor in cfDNA

Benchmarking of ichorCNA using metastatic breast/prostate patient cfDNA and healthy donor cfDNA reveals a lower limit of sensitivity for detecting the presence of tumor to be 0.03 TFx (3%). **That is, an ichorCNA-estimate of > 0.03 TFx will reliable indicate the presence of tumor for a cfDNA sample sequenced to ~0.1x whole genome coverage. ** An estimate of < 0.03 TFx indicates lowly detectable (below 3% but the estimate is less accurate) or absence of tumor-derived DNA. In the benchmark, at a 0.03 TFx cut-off, ichorCNA had a 91% specificity (correctly classified 20/22 healthy donors) and 95% sensitivity (classified 1125/1288 cancer patient mixtures).
When the data quality differs, such as sequencing coverage or overall data variance, and a cancer type that is distinctly different than breast and prostate, manual inspection of cases near the 0.03 TFx cutoff and/or tuning of parameters is recommended.

Decision for whole exome sequencing and calibrating depth of sequencing

When selecting cfDNA samples for standard depths of whole exome sequencing (e.g. mean target coverage ~150x), we recommend samples with ichorCNA estimates of > 0.1 TFx (10%). From the benchmarking, ichorCNA correctly estimated samples with > 0.1 TFx for 91% (606/669 patient-healthy donor mixtures) and correctly estimated samples with < 0.1 TFx for 96% (613/641 mixtures).
ichorCNA has been tuned to be conservative and may sometimes underestimate TFx. This was a design decision that this leads to some samples with 0.05-0.09 estimated TFx that may still be suitable for standard depth whole exome sequencing. We recommend users perform manual inspection for a few of these cases in order to consider them for further sequencing.
For samples with lower than 0.1 TFx, additional sequencing can be calibrated to increase the power to detect mutations in whole exomes. images/wes_power_curve.pdf

Guidelines for Manual Inspection of Results

Low tumor fraction samples

Samples that have an estimated tumor fraction ~0.03-0.1 may require manual curation to confirm this estimate. Our benchmarking shows that the tumor fraction estimate is most accurate when there is at least one amplification and deletion event spanning more than 100Mb each. This helps anchor the copy number levels correctly at low tumor fractions. Having prior knowledge of the common copy number alterations seen in your specific cancer type is one of the best ways to evaluate the validity of events and subsequently tumor fraction estimates for low purity samples. If a high proportion of CNAs are being called as subclonal, it may be necessary to choose an alternate solution or try rerunning ichorCNA without estimating subclonal status (--estimateScPrevalence FALSE --scStates "c()") since it is difficult to make the clonal/subclonal distinction with low purity samples using coverage alone.

Choosing between solutions

Sometimes ichorCNA will choose a suboptimal solution. Some easy ways to spot a potentially incorrect solution are if:

  • A large proportion of CNAs are being called subclonal.
  • The majority of data points are brown or red, suggesting a whole genome amplification event.
  • There are two distinct copy number levels being called neutral.

These scenarios don't always indicate a suboptimal solution but are commonly seen in them. It is important to check the next best solutions, as ranked by log likelihood, to see if any of them appear to better explain the data. If the solution chosen is suboptimal, the next best solution or two is often the better choice.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章