blast以后使用python提取比对上的序列简单小例子

这里我们用到biopython

blast 的输出结果需要保存为 xml 格式 -outfmt 设置为5

首先是blast

构建索引

makeblastdb -in Zjn_sc00188.1.fa -dbtype nucl -parse_seqids -out zjn

比对

blastn -db zjn -query query.fasta -outfmt 5 -out query.xml

接下来是python里的操作

导入用到的模块

from Bio.Blast import NCBIXML

handle = open("../../blast/query.xml",'r')
records = NCBIXML.parse(handle)
for rec in records:
    for align in rec.alignments:
        for hsp in align.hsps:
            print(dir(hsp))

这个hsp里存贮的内容包括

hsp.sbjct 是比对上的序列，这里可能会有短线可以用replace函数替换掉
expect 是e值

sbjct_start 和 sbjct_end是比对的起始和终止位置
query 是查询序列

query_start和 query_end是起始和终止位置

代码

handle = open("../../blast/query.xml",'r')
records = NCBIXML.parse(handle)
for rec in records:
    for align in rec.alignments:
        for hsp in align.hsps:
            print(dir(hsp))
            print(align.title)
            print(hsp.sbjct)
            print(hsp.sbjct_start)
            print(hsp.sbjct_end)
            print(hsp.align_length)
            print(hsp.gaps)
            #print(hsp.frame)
            #3print(hsp.bits)
            print(hsp.query)

将最终结果输出到fasta文件里

最终结果

代码

handle = open("../../blast/query.xml",'r')
fw = open("output.fasta","w")

records = NCBIXML.parse(handle)

for rec in records:
    for align in rec.alignments:
        for hsp in align.hsps:
            
            seq_id = align.title
            seq = str(hsp.sbjct).replace("-","")
            start = str(hsp.sbjct_start)
            end = str(hsp.sbjct_end)
            length = str(hsp.align_length)
            evalue = str(hsp.expect)
            
            fw.write(">%s %s %s %s %s\n%s\n"%(seq_id,length,evalue,start,end,seq))
            
fw.close()

欢迎大家关注我的公众号

小明的数据分析笔记本

小明的数据分析笔记本公众号主要分享：1、R语言和python做数据分析和数据可视化的简单小例子；2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记；3、生物信息学入门学习资料及自己的学习笔记

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

blast以后使用python提取比对上的序列简单小例子

首先是blast

EXCEL中下拉菜单中添加新选项或者删除选项

Python 爬虫：Spring Boot 反爬虫的成功案例

京东科技数字化营销能力的演进与最佳实践| 京东云技术团队

Java中止线程的方式

[转帖]Oracle Exadata 学习笔记之核心特性Part1

《最新出炉》系列入门篇-Python+Playwright自动化测试-43-分页测试

HTTP协议相关文档

跟着Nature Communications學作圖：R語言circlize包做漂亮的弦圖

跟着Nature Metabolism學作圖:R語言ggplot2一次性展示很多個餅圖

跟着Nature Metabolism學作圖:R語言ggplot2各種各樣柱形圖(1)

答讀者問：R語言批量做一元線性迴歸的簡單小例子

跟着Nature Metabolism學作圖:R語言ggplot2分組折線圖添加誤差線

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結