在R中利用fasta序列構建進化樹 關於進化樹

“進化樹的構建怎麼操作?”

“那肯定是用MEGA啊!”

可是真的好麻煩啊,要先比對再建樹,然後再進行各種美化,習慣了R就用R解決吧。

簡單Google之後發現R有現成的包可以完成分析,包括了從序列讀取、進化樹構建、進化樹美化等 。相關的R包主要是:apephangornseqinrggtree

本文中的測試數據來自一個教程,公衆號後PLANTOMIX臺回覆“進化樹”獲取下載鏈接。


關於進化樹

常用的方法包括基於距離矩陣的UPGMAME(Minimum Evolution,最小進化法)NJ(Neighbor-Joining,鄰接法)與非距離矩陣的MP(Maximum parsimony,最大簡約法)ML(Maximum likelihood,最大似然法)以及貝葉斯(Bayesian)推斷等方法。現在UPGMA很少使用,在大多數文章裏面基本是NJ或者ML。就速度來說,NJ是較快的,ML是很耗時的。

本文就簡單描述如何在R中利用DNA序列構建進化樹,本文的方法適用於DNA序列氨基酸序列蛋白序列的方法還在探索中。

 rm(list = ls())
 ​
 setwd('../../20200727【R】進化樹構建與美化/')
 ​
 if (!requireNamespace(c('ape','phangorn','seqinr'))) {
  install.packages(c('ape','phangorn','seqinr'))
 }
 ​
 library(ape)
 library(phangorn)
 library(seqinr)
 library(ggtree)
 library(ggplot2)
 library(patchwork)
 ​
 test_dna = read.dna('data/test.fasta', format = 'fasta')
 test_phyDat = phyDat(test_dna, type = 'DNA', levels = NULL)
 ​
 # 模型評估
 mt = modelTest(test_phyDat)
 print(mt)
 ​
 # 計算距離
 dna_dist = dist.ml(test_phyDat,model = 'JC69')
 ​
 # NJ樹
 tree_nj = NJ(dna_dist)
 write.tree()
 p_nj = ggtree(tree_nj, layout="radial") +
  geom_tiplab(size = 3, color = 'red')+
  labs(title = 'Neighbor Joining Tree')

 p_nj
 # UPGMA樹
 tree_upgma = upgma(dna_dist)
 p_UPGMA = ggtree(tree_upgma) +
  geom_tiplab(size = 3, color = 'green')+
  labs(title = 'UPGMA Tree')
 p_UPGMA
 ​
 # 最大簡約樹
 parsimony(tree_upgma, data = test_phyDat)
 parsimony(tree_nj, data = test_phyDat)
 test_optim = optim.parsimony(tree_nj, data = test_phyDat)
 tree_peatchet = pratchet(test_phyDat)
 p_Maximum_Parsimony = ggtree(tree_peatchet,layout="radial") +
  geom_tiplab(size = 3, color = 'blue')+
  labs(title = 'Maximum Parsimony Tree')
 p_Maximum_Parsimony
 ​
 # 最大似然法
 tree_fit = pml(tree_nj, data = test_phyDat)
 print(tree_fit)
 fitJC = optim.pml(tree_fit, model = 'JC', rearrangement = 'stochastic')
 logLik(fitJC)
 tree_bs = bootstrap.pml(fitJC, bs = 100, optNni = T,
  multicore = F, 
  control = pml.control(trace = 0))
 tree_ml_bootstrap = plotBS(midpoint(fitJC$tree), tree_bs, p = 50, type = 'p')
 ​
 p_Maximum_Likelihood = ggtree(tree_ml_bootstrap) +
  geom_tiplab(size = 3, color = 'purple')+
  labs(title = 'Maximum Likelihood Tree')
 p_Maximum_Likelihood
 ​
p_all = p_nj + p_UPGMA + p_Maximum_Parsimony + p_Maximum_Likelihood +
        plot_layout(ncol = 2)
ggsave(p_all, filename = 'figures/all.pdf', width = 8, height = 8)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章