一 背景
基因組組裝完成後,需要對組裝結果進行評估,其中GC_depth圖是一個比較重要的指標。該圖的橫座標是GC含量,縱座標是平均深度。以30k爲窗口(window)無overlap計算其GC含量和平均深度。如果存在樣品污染,通常能夠從GC含量分析中呈現出來。
下文探索了幾種繪製GC_depth散點圖的方法思路。
二 具體方法
2.1 直接生成散點圖
由於數據點太多,且相互重合,弱化了想要傳達給讀者的重要信息。
df <- read.table("result.txt", header = TRUE)
png("GC_depth.png", width = 20, height = 18, units = "cm", res = 300)
plot(df, ylab="Average depth (X)", xlab="GC content (%)", cex.lab = 1.4, cex.axis = 1.3, pch = 20, ylim = c(0,100), xlim = c(10,80))
dev.off()
2.2 通過透明度展示點密度
可能是點太多,調試了多次無法達到理想的效果。
df <- read.table("result.txt", header = TRUE)
MyGray <- rgb(t(col2rgb("black")), alpha=50, maxColorValue=255)
png("GC_depth_alpha.png", width = 20, height = 18, units = "cm", res = 300)
plot(df, ylab="Average depth (X)", xlab="GC content (%)", cex.lab = 1.4, cex.axis = 1.3, pch = 20, col= MyGray, ylim = c(0,100), xlim = c(10,80))
dev.off()
2.3 右邊和上方加框線圖
與前面兩張圖相比,信息量豐富些了。
opar <- par(no.readonly = TRUE)
df <- read.table("result.txt", header = TRUE)
subdf <- df[(df$GCpercent > 20 & df$GCpercent < 80) & (df$avgDepth < 100),]
png("GC_depth.png", width = 20, height = 18, units = "cm", res = 300)
par(fig = c(0, 0.9, 0, 0.9))
plot(subdf, ylab="Average depth (X)", xlab="GC content (%)", cex.lab = 1.4, cex.axis = 1.3, pch = 20, ylim = c(0,100), xlim = c(10,80))
par(fig = c(0, 0.9, 0.65, 1), new = TRUE)
boxplot(subdf$GCpercent, horizontal = TRUE, axes = FALSE, ylim = c(10, 80))
par(fig = c(0.75, 1, 0, 0.9), new = TRUE)
boxplot(subdf$avgDepth, axes = FALSE, ylim = c(0, 100))
par(opar)
dev.off()
2.4 右邊和上方加頻數直方圖
加上頻數直方圖後依然不是很直觀,也不美觀。
opar <- par(no.readonly = TRUE)
df <- read.table("result.txt", header = TRUE)
subdf <- df[(df$GCpercent > 20 & df$GCpercent < 80) & (df$avgDepth < 100),]
png("GC_depth2.png", width = 20, height = 18, units = "cm", res = 300)
layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)
par(mar=c(3,3,1,1))
plot(subdf, ylab="Average depth (X)", xlab="GC content (%)", cex.lab = 1.4, cex.axis = 1.3, pch = 20, ylim = c(0,100), xlim = c(10,80))
par(mar=c(0,3,1,1))
GChist <- hist(subdf$GCpercent, xlim = c(10,80), axes = FALSE, xlab = "", ylab = "", main = "")
par(mar=c(3,0,1,1))
Dephist <- hist(subdf$avgDepth, plot=FALSE)
barplot(Dephist$counts, axes=FALSE, ylim=c(0, 100), space=0, horiz=TRUE, col = NA)
par(opar)
dev.off()
2.5 利用ggplot2中的geom_bin2d函數
與上面四個圖相比好看些,將頻數最高的depth和GC含量值表現出來了。
df <- read.table("result.txt", header = TRUE)
p <- ggplot(df) + geom_bin2d(aes(GCpercent,avgDepth),binwidth = c(0.4, 0.4)) + xlim(20,70) + ylim(0,75)
ggsave("GC_depth.pdf")
2.6 利用densCols函數將不同密度的點轉換爲不同顏色
這張圖比較完美,^-^,先收工,後面想到其他方法了再補充。
df <- read.table("result.txt", header = TRUE)
dcols <- densCols(df, colramp=colorRampPalette(c("black", "white")), nbin = 1000)
df$dens <- col2rgb(dcols)[1,] + 1L
cols <- colorRampPalette(c("RoyalBlue", "orange", "red"), space = "Lab")(256)
df$col <- cols[df$dens]
png("GC_depth.png", width = 20, height = 18, units = "cm", res = 300)
plot(avgDepth ~ GCpercent, data=df[order(df$dens),], col=col, ylab="Average depth (X)", xlab="GC content (%)", cex.lab = 1.4, cex.axis = 1.3, pch = 20, ylim = c(0,100), xlim = c(10,80))
dev.off()
附:本文中使用到的文件”result.txt”鏈接
鏈接:http://pan.baidu.com/s/1qYkpbM8 密碼:ikka