“進化樹的構建怎麼操作?”
“那肯定是用MEGA啊!”
可是真的好麻煩啊,要先比對再建樹,然後再進行各種美化,習慣了R
就用R
解決吧。
簡單Google之後發現R
有現成的包可以完成分析,包括了從序列讀取、進化樹構建、進化樹美化等 。相關的R
包主要是:ape
、phangorn
、seqinr
、ggtree
。
本文中的測試數據來自一個教程,公衆號後PLANTOMIX
臺回覆“進化樹
”獲取下載鏈接。
關於進化樹
常用的方法包括基於距離矩陣的UPGMA
、ME(Minimum Evolution,最小進化法)
和 NJ(Neighbor-Joining,鄰接法)
與非距離矩陣的MP(Maximum parsimony,最大簡約法)
、ML(Maximum likelihood,最大似然法)
以及貝葉斯(Bayesian)推斷
等方法。現在UPGMA
很少使用,在大多數文章裏面基本是NJ
或者ML
。就速度來說,NJ
是較快的,ML
是很耗時的。
本文就簡單描述如何在R
中利用DNA
序列構建進化樹,本文的方法適用於DNA序列
和氨基酸序列
,蛋白序列
的方法還在探索中。
rm(list = ls())
setwd('../../20200727【R】進化樹構建與美化/')
if (!requireNamespace(c('ape','phangorn','seqinr'))) {
install.packages(c('ape','phangorn','seqinr'))
}
library(ape)
library(phangorn)
library(seqinr)
library(ggtree)
library(ggplot2)
library(patchwork)
test_dna = read.dna('data/test.fasta', format = 'fasta')
test_phyDat = phyDat(test_dna, type = 'DNA', levels = NULL)
# 模型評估
mt = modelTest(test_phyDat)
print(mt)
# 計算距離
dna_dist = dist.ml(test_phyDat,model = 'JC69')
# NJ樹
tree_nj = NJ(dna_dist)
write.tree()
p_nj = ggtree(tree_nj, layout="radial") +
geom_tiplab(size = 3, color = 'red')+
labs(title = 'Neighbor Joining Tree')
p_nj
# UPGMA樹
tree_upgma = upgma(dna_dist)
p_UPGMA = ggtree(tree_upgma) +
geom_tiplab(size = 3, color = 'green')+
labs(title = 'UPGMA Tree')
p_UPGMA
# 最大簡約樹
parsimony(tree_upgma, data = test_phyDat)
parsimony(tree_nj, data = test_phyDat)
test_optim = optim.parsimony(tree_nj, data = test_phyDat)
tree_peatchet = pratchet(test_phyDat)
p_Maximum_Parsimony = ggtree(tree_peatchet,layout="radial") +
geom_tiplab(size = 3, color = 'blue')+
labs(title = 'Maximum Parsimony Tree')
p_Maximum_Parsimony
# 最大似然法
tree_fit = pml(tree_nj, data = test_phyDat)
print(tree_fit)
fitJC = optim.pml(tree_fit, model = 'JC', rearrangement = 'stochastic')
logLik(fitJC)
tree_bs = bootstrap.pml(fitJC, bs = 100, optNni = T,
multicore = F,
control = pml.control(trace = 0))
tree_ml_bootstrap = plotBS(midpoint(fitJC$tree), tree_bs, p = 50, type = 'p')
p_Maximum_Likelihood = ggtree(tree_ml_bootstrap) +
geom_tiplab(size = 3, color = 'purple')+
labs(title = 'Maximum Likelihood Tree')
p_Maximum_Likelihood
p_all = p_nj + p_UPGMA + p_Maximum_Parsimony + p_Maximum_Likelihood +
plot_layout(ncol = 2)
ggsave(p_all, filename = 'figures/all.pdf', width = 8, height = 8)