STUtility || 空間轉錄組多樣本分析框架(二)

前情回顧:
應用空間統計學分析空間表達數據
空間信息在空間轉錄組中的運用
Giotto|| 空間表達數據分析工具箱
SPOTlight || 用NMF解卷積空間表達數據
空間轉錄組教程|| stLearn :空間軌跡推斷
Seurat 新版教程:分析空間轉錄組數據
單細胞轉錄組數據分析|| scanpy教程:空間轉錄組數據分析
10X Visium:空間轉錄組樣本製備到數據分析
定量免疫浸潤在單細胞研究中的應用
STUtility || 空間轉錄組多樣本分析框架(一)

我們在上一篇文章STUtility || 空間轉錄組多樣本分析框架(一)中演示了用STUtility分析空轉多樣本,主要是對空間信息和圖像信息的分析,可以說凸顯了空轉應有的特性。在這裏,我們將探討:

  • 空轉數據和單轉數據的相似性
  • 用Seurat對空轉數據的標準分析
  • 感興趣區域的邊緣檢測

同樣地,我們載入數據加載R包,並執行Seurat的標準化:

library(Matrix)
library(magrittr)
library(dplyr)
library(ggplot2)
library(spdep)

se <- readRDS('se.rds')

#  我們可以去除一下切片的批次,但是內存太小,只能放棄了。
# Add a section column to your meta.data
#se$section <- paste0("section_", GetStaffli(se)[[, "sample", drop = T]])

# Run normalization with "vars.to.regress" option
?SCTransform
# se.batch.cor <- SCTransform(se, vars.to.regress = "section", variable.features.n=1000,conserve.memory=T)
se<- SCTransform(se)
# 空間高變基因
spatgenes <- CorSpatialGenes(se)
head(spatgenes)
         gene       cor
CRISP3 CRISP3 0.9636610
RPL17   RPL17 0.9379202
CXCL14 CXCL14 0.9371324
MGP       MGP 0.9361545
CRIP1   CRIP1 0.9333115
NME2     NME2 0.9292425
heatmap.colors <- c("dark blue", "cyan", "yellow", "red", "dark red")
ST.FeaturePlot(se, features = c('CRISP3',"CXCL14",'CRIP1','MGP'), cols = heatmap.colors, dark.theme = T)
空轉數據和單轉數據的相似性

STUtility 的分析框架是繼承Seurat的方法,那麼我們不禁要問,目前的空間表達數據和單細胞轉錄組數據到底有多相似呢? 拿pbmc3k數據比較一下。

# Get raw count data 
umi_data <- GetAssayData(object = se, slot = "counts", assay = "RNA")
dim(umi_data)
umi_data[1:4,1:4]

# Calculate gene attributes
gene_attr <- data.frame(mean = rowMeans(umi_data),
                        detection_rate = rowMeans(umi_data > 0),
                        var = apply(umi_data, 1, var), 
                        row.names = rownames(umi_data)) %>%
    mutate(log_mean = log10(mean), log_var = log10(var))
head(gene_attr)
        mean detection_rate        var  log_mean    log_var
1 0.02564760     0.02513465 0.02601904 -1.590953 -1.5847087
2 0.03552193     0.03372660 0.03785564 -1.449503 -1.4218694
3 0.04334445     0.04142088 0.04531866 -1.363067 -1.3437230
4 0.02436522     0.02333932 0.02582668 -1.613230 -1.5879315
5 0.03500898     0.03270069 0.03840484 -1.455821 -1.4156140
6 0.07989228     0.06758143 0.10506953 -1.097495 -0.9785232

# Obtain spot attributes from Seurat meta.data slot
spot_attr <- se[[c("nFeature_RNA", "nCount_RNA")]]
# Get  pbm  raw count data 
library(pbmc3k.SeuratData,lib.loc = 'E:\\software\\R\\R-4.0.2\\library')
AvailableData()

pbmc3kumi_data <- GetAssayData(object = pbmc3k.final, slot = "counts", assay = "RNA")
dim(pbmc3kumi_data)
# Calculate gene attributes
pbmc3kgene_attr <- data.frame(mean = rowMeans(pbmc3kumi_data),
                              detection_rate = rowMeans(pbmc3kumi_data > 0),
                              var = apply(pbmc3kumi_data, 1, var), 
                              row.names = rownames(pbmc3kumi_data)) %>%
    mutate(log_mean = log10(mean), log_var = log10(var))

繪製兩種數據的分佈:

p1 <- ggplot(gene_attr, aes(log_mean, log_var)) + 
    geom_point(alpha = 0.3, shape = 16, color = "white") + 
    geom_density_2d(size = 0.3) +
    geom_abline(intercept = 0, slope = 1, color = 'red') +
    ggtitle("Mean-variance relationship") + DarkTheme()
# add the expected detection rate under Poisson model
x = seq(from = -2, to = 2, length.out = 1000)
poisson_model <- data.frame(log_mean = x, detection_rate = 1 - dpois(0, lambda = 10^x))
p2 <- ggplot(gene_attr, aes(log_mean, detection_rate)) + 
    geom_point(alpha = 0.3, shape = 16, color = "white") + 
    geom_line(data = poisson_model, color='red') +
    ggtitle("Mean-detection-rate relationship") + DarkTheme()

p3 <- ggplot(pbmc3kgene_attr, aes(log_mean, log_var)) + 
    geom_point(alpha = 0.3, shape = 16, color = "white") + 
    geom_density_2d(size = 0.3) +
    geom_abline(intercept = 0, slope = 1, color = 'red') +
    ggtitle("PBMC3k\n Mean-variance relationship") + DarkTheme()
# add the expected detection rate under Poisson model
x = seq(from = -2, to = 2, length.out = 1000)
poisson_model <- data.frame(log_mean = x, detection_rate = 1 - dpois(0, lambda = 10^x))
p4 <- ggplot(pbmc3kgene_attr, aes(log_mean, detection_rate)) + 
    geom_point(alpha = 0.3, shape = 16, color = "white") + 
    geom_line(data = poisson_model, color='red') +
    ggtitle("PBMC3k\n Mean-detection-rate relationship") + DarkTheme()
cowplot::plot_grid(p1, p2,p3,p4, nrow = 2)

我們看到二種數據的分佈還是很相似的,這也解釋了爲什麼大部分的單細胞轉錄組的分析方法在空轉中都是可以應用的。接下來,我們檢查空轉數據是否爲稀疏矩陣,也就是零值有多少。

 sum(pbmc3kumi_data == 0)/(ncol(pbmc3kumi_data)*nrow(pbmc3kumi_data))
[1] 0.9381182
 sum(umi_data == 0)/(ncol(umi_data)*nrow(umi_data))
[1] 0.6779447
 sum(umi_data == 0) / prod(dim(umi_data))
[1] 0.6779447
 print(object.size(pbmc3kumi_data), units = "Mb")
26.7 Mb
 print(object.size(umi_data), units = "Mb")
469.4 Mb
Seurat對空轉數據的標準分析

可見在空轉中零值也是很多的,雖然沒有單轉那麼多。接下來進行表達數據的降維聚類分析。鑑於這一部分和單轉太相似了,我們就不再每一步地解釋了。

?RunNMF  # 特徵選擇類似PCA
se <- RunNMF(se, nfactors = 40) # Specificy nfactors to choose the number of factors, default=20.
cscale <- c("darkblue", "cyan", "yellow", "red", "darkred")
ST.DimPlot(se, 
           dims = 1:2,
           ncol = 2, # Sets the number of columns at dimensions level
           grid.ncol = 2, # Sets the number of columns at sample level
           reduction = "NMF", 
           dark.theme = T, 
           pt.size = 1, 
           center.zero = F, 
           cols = cscale)
print(se[["NMF"]])

分羣並繪製分羣信息:

FactorGeneLoadingPlot(se, factor = 1, dark.theme = TRUE)
se <- FindNeighbors(object = se, verbose = FALSE, reduction = "NMF", dims = 1:40)
se <- FindClusters(object = se, verbose = FALSE)
library(RColorBrewer)
n <- 19
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
ST.FeaturePlot(object = se, features = "seurat_clusters", dark.theme = T, cols = col_vector, pt.size = 2,sb.size = 5 )
head(se@[email protected], 20)
[1] "IGLC2"    "IGHG3"    "IGKC"     "ALB"      "IGLC3"    "IGHG1"    "IGHM"     "IGLC1"    "IGHA1"    "MGP"      "CRISP3"   "IGHG2"   
[13] "IGHG4"    "CPB1"     "IGHA2"    "SERPINA3" "JCHAIN"   "CXCL14"   "CCL21"    "CCL19"   

top <- se@[email protected]
FeatureOverlay(se, 
                        features = c('IGLC2','CXCL14'), 
                        sampleids = 1:2,
                        cols = c("darkblue", "cyan", "yellow", "red", "darkred"),
                        pt.size = 1.5, 
                        pt.alpha = 0.5, 
                        dark.theme = T, 
                        ncols.samples = 2)

我們可以在UMAP空間上看spot的分羣情況,當然,之前在Seurat中對分羣做的一切動作這裏都是可以做的,如split和group等。

 se <- RunUMAP(se, reduction = "NMF", dims = 1:40, n.neighbors = 10)
 DimPlot(object = se, reduction = "umap", pt.size = 1.5,ncol = 2)+ DarkTheme()

STUtility 還提供用三原色可視化分羣信息的選項,這樣可視化更加清楚。

 se <- RunUMAP(object = se, dims = 1:40, verbose = FALSE, n.components = 3, reduction = "NMF", reduction.name = "umap.3d")
ST.DimPlot(object = se, dims = 1:3, reduction = "umap.3d", blend = T, dark.theme = T, pt.size = 1.5)

分羣之後,我們要做的就是看每個羣的特徵,比較直接的生物學意義就是做差異分析:

markers <- FindMarkers(se, ident.1 = "12")
head(markers)
                   p_val avg_logFC pct.1 pct.2     p_val_adj
AC026355.1 7.800313e-211 0.6957137 0.799 0.165 1.269345e-206
U62317.5   6.955878e-192 0.4293024 0.454 0.053 1.131930e-187
CPB1       1.666918e-171 1.4851863 1.000 0.959 2.712576e-167
FCGR3B     4.019841e-160 1.1899326 0.883 0.335 6.541488e-156
IL6ST      5.819598e-158 0.6636229 1.000 0.998 9.470232e-154
SCUBE3     1.797465e-157 0.7575323 0.862 0.288 2.925015e-153

可視化marker基因:

 FeatureOverlay(se, 
                        features = c('IL6ST'), 
                        sampleids = 1:2,
                        cols = c("darkblue", "cyan", "yellow", "red", "darkred"),
                        pt.size = 1.5, 
                        pt.alpha = 0.5, 
                        dark.theme = T, 
                        ncols.samples = 2)
邊緣檢測

有時提取一一亞羣點的“鄰域”是有用的。,我們展示瞭如何應用這個方法來找到任何感興趣區域的所有鄰近點。

一旦定義好了一個感興趣的區域(region of interest,ROI),並且希望找到與該區域相鄰的所有點,您可以使用regionneighbors函數來自動檢測這些點。
例如,假設我們想要選擇所有cluster 2 的邊緣spot。第一步是確保seurat對象的標識是正確的,這裏我們需要將其設置爲“seurat_clusters”。當然, 如果是註釋好的,那就更有生物學意義了。

?RegionNeighbours
st <- SetIdent(st, value = "seurat_clusters")
st <- RegionNeighbours(st, id = "2", verbose = TRUE)
FeatureOverlay(st, features = "nbs_2", ncols.samples = 2, sampleids = 1:2, cols = c("red", "lightgray"), pt.size = 2, dark.theme = T)

檢測到亞羣的邊緣之後,肯定是要看看邊緣和內部有什麼差異了,最直觀的就是做一下差異基因看看:

library(magrittr)
library(dplyr)

se <- SetIdent(se, value = "nbs_2")
 nbs_2.markers <- FindMarkers(se, ident.1 = "2", ident.2 = "nbs_2")
 nbs_2.markers$gene <- rownames(nbs_2.markers)
 st.subset <- SubsetSTData(se, expression = nbs_2 %in% c("2", "nbs_2"))
 sorted.marks <- nbs_2.markers %>% top_n(n = 40, wt = abs(avg_logFC))
 sorted.marks <- sorted.marks[order(sorted.marks$avg_logFC, decreasing = T), ]
 DoHeatmap(st.subset, features = sorted.marks$gene, group.colors = c("red", "lightgray"), disp.min = -2, disp.max = 2) + DarkTheme() 
FeatureOverlay(st.subset, features =sorted.marks$gene[1:4], pt.size = 2, dark.theme = T, 
               ncols.features = 2, cols = c("darkblue", "cyan", "yellow", "red", "darkred"))


https://en.wikipedia.org/wiki/Region_of_interest
感興趣的區域(通常縮寫爲ROI)是爲特定目確定的數據集中的樣本。ROI的概念在許多應用領域中普遍使用。例如,在醫學成像中,爲了測量腫瘤的大小,腫瘤的邊界可以在圖像或體積上確定。我們將會在下一篇文章中講述如何選擇感興趣的區域。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章