Paper intensive reading (十六):Deep learning enables accurate clustering and batch effect removal

論文題目:Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis

scholar 引用:0

頁數:14

發表時間:2019.1.25

發表刊物:preprint

作者:Xiangjie Li1,2, Yafei Lyu1, Jihwan Park3, Jingxiao Zhang2, Dwight Stambolian4, Katalin Susztak3,5  Gang Hu1,5*, Mingyao Li1*

University of Pennsylvania Perelman School of Medicine

摘要:

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells imposes computational challenges. We present an unsupervised deep embedding algorithm for single-cell clustering (DESC) that iteratively learns cluster-specific gene expression signatures and cluster assignment. DESC significantly improves clustering accuracy across various datasets and is capable of removing complex batch effects while maintaining true biological variations.

這個準確來說不是一篇paper,是一個會議的report。

正文部分內容摘錄:

  • An open-source implementation of the DESC algorithm can be downloaded from https://eleozzr.github.io/desc/.

  • ScRNA-seq clustering and batch effect removal are typically addressed through separate analyses. Commonly used approaches to remove batch effect include Seurat’s Canonical Correlation Analysis3 (CCA) or Mutual Nearest Neighbors (MNN) approach4. 在ScRNA-seq中常用的消除批次效應的方法:CCA和MNN

  • After removing batch effect, clustering analysis is performed to identify cell clusters using methods such as Louvain’s method5, Infomap6, graph-based clustering7, shared nearest neighbor8, or consensus clustering with SC39. 消除了批次效應以後用聚類方法

  • Since some cell types are more vulnerable to batch effect than others, batch effect removal should be performed jointly with clustering to achieve optimal performance. 批次效應有時候應該結合聚類方法來獲取最佳效果

  • However, none of the existing methods are capable of simultaneously clustering cells and removing batch effect.目前,尚不存在這種方法

  • We developed DESC, an unsupervised deep learning algorithm that iteratively learns cluster-specific gene expression representation and cluster assignments for scRNA-seq data clustering (Fig. 1a). Using a deep neural network, DESC initializes clustering obtained from an autoencoder and learns a non-linear mapping function from the original scRNA-seq data space to a low-dimensional feature space by iteratively optimizing a clustering objective function. This iterative procedure moves each cell to its nearest cluster, balances biological and technical differences between clusters, and reduces the influence of batch effect. DESC also enables soft clustering by assigning cluster-specific probabilities to each cell, facilitating the clustering of cells with high-confidence. DESC的主要原理

  • We benchmarked DESC’s performance by analyzing the multi-tissue gene expression data in GTEx10. 評估算法性能的數據集,一個模擬數據集,(n=11,688)

  • adjusted rand index (ARI)

  • In summary, we have developed a deep learning algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function with a self-training target distribution.

  • DESC’s memory usage and running time increase linearly with the number of cells, thus making it scalable to large datasets (Fig. 3e). DESC can further speed up computation by GPUs.

  • We analyzed a mouse brain dataset with 1.3 million cells generated by 10X, which only took about 3.5 hours with one NVIDIA TITAN Xp GPU (Supplementary Note 6).

  • Compared to existing scRNA-seq clustering methods DESC improves clustering by iteratively learning cluster-specific gene expression features from cells clustered with high confidence.

  • This iterative clustering also removes batch effect and maintains true biological differences between clusters.

  • As the growth of single-cell studies increases, DESC will be a more precise tool for clustering of large datasets.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章