blast以後使用python提取比對上的序列簡單小例子

這裏我們用到biopython

blast 的輸出結果需要保存爲 xml 格式 -outfmt 設置爲5

首先是blast

構建索引

makeblastdb -in Zjn_sc00188.1.fa -dbtype nucl -parse_seqids -out zjn

比對

blastn -db zjn -query query.fasta -outfmt 5 -out query.xml

接下來是python裏的操作

導入用到的模塊

from Bio.Blast import NCBIXML

handle = open("../../blast/query.xml",'r')
records = NCBIXML.parse(handle)
for rec in records:
    for align in rec.alignments:
        for hsp in align.hsps:
            print(dir(hsp))

這個hsp裏存貯的內容包括

hsp.sbjct 是比對上的序列，這裏可能會有短線可以用replace函數替換掉
expect 是e值

sbjct_start 和 sbjct_end是比對的起始和終止位置
query 是查詢序列

query_start和 query_end是起始和終止位置

代碼

handle = open("../../blast/query.xml",'r')
records = NCBIXML.parse(handle)
for rec in records:
    for align in rec.alignments:
        for hsp in align.hsps:
            print(dir(hsp))
            print(align.title)
            print(hsp.sbjct)
            print(hsp.sbjct_start)
            print(hsp.sbjct_end)
            print(hsp.align_length)
            print(hsp.gaps)
            #print(hsp.frame)
            #3print(hsp.bits)
            print(hsp.query)

將最終結果輸出到fasta文件裏

最終結果

代碼

handle = open("../../blast/query.xml",'r')
fw = open("output.fasta","w")

records = NCBIXML.parse(handle)

for rec in records:
    for align in rec.alignments:
        for hsp in align.hsps:
            
            seq_id = align.title
            seq = str(hsp.sbjct).replace("-","")
            start = str(hsp.sbjct_start)
            end = str(hsp.sbjct_end)
            length = str(hsp.align_length)
            evalue = str(hsp.expect)
            
            fw.write(">%s %s %s %s %s\n%s\n"%(seq_id,length,evalue,start,end,seq))
            
fw.close()

歡迎大家關注我的公衆號

小明的數據分析筆記本

小明的數據分析筆記本公衆號主要分享：1、R語言和python做數據分析和數據可視化的簡單小例子；2、園藝植物相關轉錄組學、基因組學、羣體遺傳學文獻閱讀筆記；3、生物信息學入門學習資料及自己的學習筆記

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

blast以後使用python提取比對上的序列簡單小例子

首先是blast

跟着Nature Communications學作圖：R語言circlize包做漂亮的弦圖

跟着Nature Metabolism學作圖:R語言ggplot2一次性展示很多個餅圖

跟着Nature Metabolism學作圖:R語言ggplot2各種各樣柱形圖(1)

答讀者問：R語言批量做一元線性迴歸的簡單小例子

跟着Nature Metabolism學作圖:R語言ggplot2分組折線圖添加誤差線

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結