看到一個成熟的R包,所以推薦一下
- https://knausb.github.io/vcfR_documentation/
- https://cran.r-project.org/web/packages/vcfR/vignettes/intro_to_vcfR.html
最重要的使用方法,當然是讀入vcf文件啦,如下:
library(vcfR) vcf_file='/Users/jmzeng/germline/merge.dbsnp.vcf' vcf <- read.vcfR( vcf_file, verbose = FALSE )
十幾秒鐘就輕輕鬆鬆讀入一個300多M的vcf文件啦,成爲一個S4對象:
> vcf ***** Object of Class vcfR ***** 39 samples 24 CHROMs 212,875 variants Object size: 356.6 Mb 0 percent missing data ***** ***** ***** >
懂R的朋友就應該是知道如何去查看幫助文檔來學習它啦,最重要的兩句話是:
Objects of class vcfR can be manipulated with vcfR-method and extract.gt. Contents of the vcfR object can be visualized with the plot method.
最基本的3個元素是:
meta character vector for the meta information fix matrix for the fixed information gt matrix for the genotype information
其中meta存儲着vcf的頭文件,而fix存儲在vcf的固定列,gt存儲在樣本基因型信息。
最基本的操作函數如下:
show(object) colnames(vcf@fix) vcf@fix[1:4,1:4] colnames(vcf@gt) vcf@gt[1:4,1:4] head(x, n = 6, maxchar = 80) plot(x, y, ...) ## 主要查看突變位點的質量值的分佈情況 x[i, j, samples = NULL, ..., drop] dim(x) nrow(x)
解析出來了讀入的文件所變成的對象,後續分析就簡單多了。
包自帶的測試數據
pkg <- "pinfsc50" vcf_file <- system.file("extdata", "pinf_sc50.vcf.gz", package = pkg) dna_file <- system.file("extdata", "pinf_sc50.fasta", package = pkg) gff_file <- system.file("extdata", "pinf_sc50.gff", package = pkg) library(vcfR) vcf <- read.vcfR( vcf_file, verbose = FALSE ) dna <- ape::read.dna(dna_file, format = "fasta") gff <- read.table(gff_file, sep="\t", quote="") library(vcfR) chrom <- create.chromR(name='Supercontig', vcf=vcf, seq=dna, ann=gff) plot(chrom) chromoqc(chrom, dp.alpha=20) chromoqc(chrom, xlim=c(5e+05, 6e+05))
可以出一下還不錯的圖,我覺得顏值還行!