【R】seqlogo圖 DNA序列 氨基酸序列 自定義數據 矩陣類型數據 參考文獻

seqlogo圖常用於展示特定爲區域的序列信息,就像這樣^{[1]}​:

之前很好奇這種圖是怎麼畫出來的,後面知道了一個R包:ggseqlogo^{[2]}​。提供了一系列的可視化方法:

作者也提供了完整的教程:https://omarwagih.github.io/ggseqlogo/

這種圖,重要的是理解數據結構,然後就可以用在自己的數據上了。本文的示例數據在公衆號PLANTOMIX後臺回覆seqlogo即可獲取。

DNA序列

有兩種方法,一種是按照Bits進行展示,另外一種是以prob(比例)進行展示。直接將數據放在數據框裏面即可:

require(ggplot2)
 require(ggseqlogo)
 library(stringr)
 library(ggsci)
 library(tidyverse)
 ​
 ​
 # DNA序列
 seq_dna = read.table('data/test.DNA.seq.txt', header = T)
 ​
 p1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'prob') +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p1
 ggsave(p1, filename = 'figures/1.png', width = 5, height = 3)
 ​
 p1.1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'bits') +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p1.1
 ggsave(p1.1, filename = 'figures/1.1.png', width = 5, height = 3)

氨基酸序列

 # 氨基酸序列
 seq_aa = read.table('data/test.AA.seq.txt', header = T)
 ​
 p2 = ggseqlogo(as.character(seq_aa$.), method = 'prob') +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p2
 ggsave(p2, filename = 'figures/2.png', width = 5, height = 3)

自定義數據

ggseqlogo支持自定義數據,如數字。

# 自定義序列
 seq_diy = matrix(ncol = 1, nrow = 10) %>%
  as.data.frame()
 ​
 for (i in 1:nrow(seq_diy)) {
  seq.temp = as.character(sample(1:4,10, replace = T))
  seq.temp.2 = seq.temp[1]
  for (j in 2:10) {
  seq.temp.2 = paste(seq.temp.2, seq.temp[j], sep = '')
  }
  seq_diy[i,] = seq.temp.2
 }
 colnames(seq_diy) = 'test.seq'
 ​
 p4 = ggseqlogo(as.character(seq_diy$test.seq), 
  method = 'prob',
  namespace=1:4) +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p4
 ggsave(p4, filename = 'figures/4.png', width = 5, height = 3)

矩陣類型數據

另外一種使用得更多的數據應該是類似這樣的:

 # matrix數據
 seq_matrix = read.table('data/test.matrix.txt', header = T) %>%
  as.matrix()
 ​
 p3 = ggseqlogo(seq_matrix, method = 'bits') +
  theme_bw()
 p3
 ggsave(p3, filename = 'figures/3.png', width = 5, height = 3)

更多可視化方法參照作者教程網站:https://omarwagih.github.io/ggseqlogo/

參考文獻

[1] Li, Ying, et al. "Magnaporthe oryzae Auxiliary Activity Protein MoAa91 Functions as Chitin-Binding Protein To Induce Appressorium Formation on Artificial Inductive Surfaces and Suppress Plant Immunity." Mbio 11.2 (2020).
[2] Wagih, Omar. "ggseqlogo: a versatile R package for drawing sequence logos." Bioinformatics 33.22 (2017): 3645-3647.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章