Haplotype-aware genotyping from noisy long reads 單倍型識別的基因分型來自嘈雜的長讀

Haplotype-aware genotyping from noisy long reads

單倍型識別的基因分型來自嘈雜的長讀

Abstract

Motivation Current genotyping approaches for single nucleotide variations (SNVs) rely on short, relatively accurate reads from second generation sequencing devices. Presently, third generation sequencing platforms able to generate much longer reads are becoming more widespread. These platforms come with the significant drawback of higher sequencing error rates, which makes them ill-suited to current genotyping algorithms. However, the longer reads make more of the genome unambiguously mappable and typically provide linkage information between neighboring variants.

Results In this paper we introduce a novel approach for haplotype-aware genotyping from noisy long reads. We do this by considering bipartitions of the sequencing reads, corresponding to the two haplotypes. We formalize the computational problem in terms of a Hidden Markov Model and compute posterior genotype probabilities using the forward-backward algorithm. Genotype predictions can then be made by picking the most likely genotype at each site. Our experiments indicate that longer reads allow significantly more of the genome to potentially be accurately genotyped. Further, we are able to use both Oxford Nanopore and Pacific Biosciences sequencing data to independently validate millions of variants previously identified by short-read technologies in the reference NA12878 sample, including hundreds of thousands of variants that were not previously included in the high-confidence reference set.

摘要

目前單核苷酸變異(SNVs)的基因分型方法依賴於第二代測序設備的短而相對準確的讀取。目前,能夠產生更長的reads的第三代測序平臺正變得越來越普遍。這些平臺具有較高的測序錯誤率的顯著缺陷,這使得它們不適合當前的基因分型算法。然而,讀取時間越長,基因組的可映射性就越強,並且通常會在相鄰的變體之間提供鏈接信息。

結果

本文提出了一種新的單倍型識別的基因分型方法。我們通過考慮與兩個單倍型相對應的測序讀的雙分區來做到這一點。我們用隱馬爾可夫模型對計算問題進行形式化,並使用前向-後向算法計算後驗基因型概率。然後可以通過在每個位點選擇最可能的基因型進行基因型預測。我們的實驗表明,較長的讀取時間可以使更多的基因組被潛在地精確地分型。此外,我們還可以使用牛津納米顆粒公司和太平洋生物科學公司的測序數據來獨立驗證參考NA12878樣本中先前通過短讀技術識別出的數百萬個變異,包括數十萬個以前沒有被納入高置信度參考集的變異。

參考文獻

https://www.biorxiv.org/content/10.1101/293944v2.abstract

發佈了517 篇原創文章 · 獲贊 83 · 訪問量 20萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章