我在生信技能樹上面發佈的GATK4教程也有不少了 本着儘量使用最新版軟件的原則,也準備把之前的gatk對RNA-seq數據找變異的流程進行轉換:
$GATK --java-options "-Xmx25G -Djava.io.tmpdir=./" AddOrReplaceReadGroups \ -I $id -O ${sample}_right.bam -SO coordinate -ID ${sample} -LB rna \ -PL illumina -PU hiseq -SM ${sample} $GATK --java-options "-Xmx25G -Djava.io.tmpdir=./" MarkDuplicates \ -I ${sample}_right.bam -O ${sample}_marked.bam -M $sample.metrics --REMOVE_DUPLICATES TRUE $GATK --java-options "-Xmx25G -Djava.io.tmpdir=./" FixMateInformation \ -I ${sample}_marked.bam -O ${sample}_marked_fixed.bam -SO coordinate $GATK --java-options "-Xmx25G -Djava.io.tmpdir=./" SplitNCigarReads \ -R $GENOME -I ${sample}_marked_fixed.bam -O ${sample}_marked_fixed_split.bam \ -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS #--fix_misencoded_quality_scores ## --fix_misencoded_quality_scores only if phred 64
但是走到了 SplitNCigarReads 才發現,這個命令當初學的太久了,忘記各個參數啥意思了,就想搜索看看如何轉換。
還真發現了有人問同樣的問題,GATK4: How to reassign STAR mapping quality from 255 to 60 with SplitNCigarReads ,而且GATK4開發團隊也回答了:EDIT: Geraldine responded here.
但是這是一個否定回答,開發團隊讓我們回去用GATK3來跑流程。
One risk that I see is that using the STAR --outSAMmapqUnique 60
option maybe fixes the issue with GATK, but that other downstream tools maybe still depend on the (still default) STAR mapping quality value of 255 (e.g. cufflinks).
The mapping quality MAPQ (column 5) is 255 for uniquely mapping reads, and int(-10*log10(1- 1/Nmap)) for multi-mapping reads. This scheme is same as the one used by TopHat and is com- patible with Cuffinks. The default MAPQ=255 for the unique mappers maybe changed with --outSAMmapqUnique parameter (integer 0 to 255) to ensure compatibility with downstream tools such as GATK.
https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
好吧,回去就回去,gatk3代碼是:
module load java/1.8.0_91 GATK=/home/jianmingzeng/biosoft/GATK/GenomeAnalysisTK.jar PICARD=/home/jianmingzeng/biosoft/picardtools/2.9.2/picard.jar GENOME=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/Homo_sapiens_assembly38.fasta INDEX=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/bwa_index/gatk_hg38 DBSNP=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/dbsnp_146.hg38.vcf.gz kgSNP=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz kgINDEL=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz TMPDIR=/home/jianmingzeng/tmp/software ## samtools and bwa are in the environment ## samtools Version: 1.3.1 (using htslib 1.3.1) ## bwa Version: 0.7.15-r1140 cat $1 |while read id do echo $id file=$(basename $id ) sample=${file%%_*} echo $sample java -Djava.io.tmpdir=$TMPDIR -Xmx25g -jar $PICARD AddOrReplaceReadGroups \ I=$id O=${sample}.bam SO=coordinate RGID=${sample} RGLB=rna \ RGPL=illumina RGPU=hiseq RGSM=${sample} java -Djava.io.tmpdir=$TMPDIR -Xmx25g -jar $PICARD MarkDuplicates \ INPUT=${sample}.bam OUTPUT=${sample}_marked.bam METRICS_FILE=$sample.metrics REMOVE_DUPLICATES=TRUE java -Djava.io.tmpdir=$TMPDIR -Xmx25g -jar $PICARD FixMateInformation \ INPUT=${sample}_marked.bam OUTPUT=${sample}_marked_fixed.bam SO=coordinate samtools index ${sample}_marked_fixed.bam java -Djava.io.tmpdir=$TMPDIR -Xmx25g -jar $GATK -T SplitNCigarReads \ -R $GENOME -I ${sample}_marked_fixed.bam -o ${sample}_marked_fixed_split.bam \ -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS #--fix_misencoded_quality_scores ## --fix_misencoded_quality_scores only if phred 64 java -Djava.io.tmpdir=$TMPDIR -Xmx25g -jar $GATK -T HaplotypeCaller \ -R $GENOME -I ${sample}_marked_fixed_split.bam --dbsnp $DBSNP \ -stand_emit_conf 10 -o ${sample}_raw.vcf rm ${sample}.bam ${sample}_marked.bam ${sample}_marked_fixed.bam ${sample}_marked_fixed_split.bam done
純乾貨代碼,誰學誰獲益。