周讀論文系列筆記(2)-reivew-A survey on Deep Learning in Medical Image Analysis

剛接觸這個領域…不怎麼會寫…有些翻譯錯和理解錯的地方請大佬們多多指教~

這篇論文分爲四個部分：
Deep learning methods
Deep learning uses in medical imaging
Application areas
Challenges and outlook
第1部分在這裏就不寫了…寫剩下的部分

原文鏈接：https://www.sciencedirect.com/science/article/pii/S1361841517301135

文章目錄

2.1 Classification 分類

2.2 Detection 檢測

2.3 Segmentation 分割

3.Application areas

4.Dissusion

1.Deep learning methods

2.Deep learning uses in medical imaging

2.1 Classification 分類

2.1.1 Image/exam classification （圖像/exam 分類）

Image or exam classification was one of the first areas in which deep learning made a major contribution to medical image analysis.

In exam classification one typically has one or multiple images (an exam) as input with a single diagnostic variable as output (e.g., disease present or not).

Dataset sizes are small -> transfer learning

Two transfer learning strategies were identified:
(1) using a pre-trained network as a feature extractor.
(2) fine-tuning a pre-trained network on medical data.
The former strategy has the extra benefit of not requiring one to train a deep network at all, allowing the extracted features to be easily plugged in to existing image analysis pipelines. Both strategies are popular and have been widely applied. Few authors perform a investigation in which strategy gives the best result.

Methods:
(1)Initially focus on unsupervised pre-training and network architectures like SAEs(Sparse Autoencoder稀疏自編碼器) and RBMs(Restricted Boltzmann Machine 受限玻爾茲曼機).
(2)CNN (in 2015, 2016, 2017)
The application areas ranging from brain MRI to retinal imaging(視網膜成像) and digital pathology(數字病理學) to lung computed tomography(肺部計算機斷層掃描).
(3)In the more recent papers using CNNs authors also often train their own network architectures from scratch instead of using pre-trained networks.
(4)Three papers used an architecture leveraging the unique attributes of medical data.(3D…)

Summary: in exam classification CNNs are the current standard techniques. Especially CNNs pre- trained on natural images have shown surprisingly strong results, challenging the accuracy of human ex- perts in some tasks. Last, authors have shown that CNNs can be adapted to leverage intrinsic structure of medical images.

2.1.2 Object or lesion classification （object或病變分類）

Object classification usually focuses on the classification of a small (previously identified) part of the medical image into two or more classes (e.g. nodule classification in chest CT).

For many of these tasks both local information on lesion appearance and global contextual information on lesion location are required for accurate classification.
This combination is typically not possible in generic deep learning architectures.

Methods:
(1)Almost all recent papers prefer the use of end-to-end trained CNNs.
Several authors have used multi-stream architectures to resolve this in a multi-scale fashion.
three CNNs(each of which takes a nodule patch), a combination of CNNs and RNNs(for grading nuclear cataracts對核白內障分級) , 3D CNN(high-grade gliomas高級別膠質瘤)

(2)In some cases other architectures and approaches are used, such as RBMs (Restricted Boltzmann Machine 受限玻爾茲曼機) SAEs (Sparse Autoencoder稀疏自編碼器) and convolutional sparse auto-encoders (CSAE) (卷積稀疏自編碼器). The major difference between CSAE and a classic CNN is the usage of unsupervised pre-training with sparse auto-encoders.

(3)An interesting approach, especially in cases where object annotation to generate training data is expensive, is the integration of multiple instance learning (MIL多實例學習) and deep learning.

Summary:
object classification sees less use of pre-trained networks compared to exam classifications, mostly due to the need for incorporation of contextual or 3D information. Several authors have found innovative solutions to add this information to deep networks with good results, and as such we expect deep learning to become even more prominent for this task in the near future.

2.2 Detection 檢測

2.2.1 Organ, region and landmark localization （器官定位）

Anatomical object localization (in space or time), such as organs or landmarks, has been an important pre-processing step in segmentation tasks or in the clinical workflow for therapy planning and intervention.
Localization in medical imaging often requires parsing of 3D volumes.

Methods:
Space:
(1)To solve 3D data parsing with deep learning algorithms, several approaches have been proposed that treat the 3D space as a composition of 2D orthogonal planes.
(2)Other authors try to modify the network learning pro- cess to directly predict locations. (Due to its increased complexity, only a few methods addressed the direct localization of landmarks and regions in the 3D image space.)
Time:
(1)CNNs have also been used for the localization of scan planes or key frames in temporal data.
(2)RNN, particularly LSTM-RNNS, have also been used to exploit the temporal information contained in medical videos, another type of high dimensional data.
(3)Combine an LSTM-RNN with a CNN

Summary:
Localization through 2D image classification with CNNs seems to be the most popular strategy overall to identify organs, regions and landmarks, with good results.
However, several recent papers expand on this concept by modifying the learning process such that accurate localization is directly emphasized, with promising results.
We expect such strategies to be explored further as they show that deep learning techniques can be adapted to a wide range of localization tasks (e.g. multiple landmarks).
RNNs have shown promise in localization in the temporal domain, and multi-dimensional RNNs could play a role in spatial localization as well.

2.2.2 Object or lesion detection （object或病變檢測）

The detection of objects of interest or lesions in images is a key part of diagnosis and is one of the most labor-intensive for clinicians.
Typically, the tasks consist of the localization and identification of small lesions in the full image space.

Methods:
(1)There has been a long research tradition in computer-aided detection systems that are designed to automatically detect lesions. The first object detection system using CNNs was already proposed in 1995, using a CNN with four layers to detect nodules in x-ray images.

(2)Most of the published deep learning object detection systems still uses CNNs to perform pixel (or voxel) classification, after which some form of post processing is applied to obtain object candidates.
As the classification task performed at each pixel is essentially object classification, CNN architecture and methodology are very similar to object classification.(the incorporation of contextual or 3D information: multi-stream CNNs)

Different between object detection and object classification:
Because every pixel is classified, the class balance is skewed severely towards the non-object class in a training setting.
-> To add insult to injury, usually the majority of the non-object samples are easy to discriminate.
->fCNNs(classifying each pixel in a sliding window fashion results in orders of magnitude of redundant calculation)

Summary:
Challenges are similar to those in object classification.
Few papers directly address issues specific to object detection like class imbalance/hard-negative mining or efficient pixel/voxel-wise processing of images.
We expect that more emphasis will be given to those areas in the near future, for example in the application of multi-stream networks in a fully convolutional fashion.

2.3 Segmentation 分割

2.3.1 Organ and substructure segmentation （器官和子結構分割）

The segmentation of organs and other substructures in medical images allows quantitative analysis of clinical parameters related to volume and shape, as, for ex-ample, in cardiac or brain analysis. Furthermore, it is often an important first step in computer-aided detection pipelines.

The task of segmentation is typically defined as identifying the set of voxels which make up either the contour or the interior of the object(s) of interest(對象的輪廓或內部).
Segmentation is the most common subject of papers applying deep learning to medical imaging, and as such has also seen the widest variety in methodology.

Methods:
(1)The most well-known, in medical image analysis, of these novel CNN architectures is U-net.
(2)RNNs have recently become more popular for segmentation tasks.
(3)Many authors have also obtained excellent segmentation results with patch-trained neural networks. Most recent papers now use fCNNs in preference over sliding-window-based classification to reduce redundant computation.(fCNNs have also been extended to 3D and have been applied to multiple targets at once)

Challenge:
One challenge with voxel classification approaches is that they sometimes lead to spurious responses.
To combat this, groups have tried to combine fCNNs with graphical models like MRFs and Conditional Random Fields (CRFs) to refine the segmentation output.
In most of the cases, graphical models are applied on top of the likelihood map produced by CNNs or fCNNs and act as label regularizers.

Summary:
Segmentation in medical imaging has seen a huge influx of deep learning related methods. Custom architectures have been created to directly target the segmentation task. These have obtained promising results, rivaling and often improving over results obtained with fCNNs.

2.3.2 Lesion segmentation （病變分割）

Segmentation of lesions combines the challenges of object detection and organ and substructure segmentation in the application of deep learning algorithms.
(1)Global and local context are typically needed to perform accurate segmentation, such that multi-stream networks with different scales or non-uniformly sampled patches are used.
(2)In lesion segmentation we have also seen the application of U-net and similar architectures to leverage both this global and local context.

Challenge:
class imbalance.
solutions:(1)adapting the loss function (2)performing data augmentation on positive samples

Summary:
Thus lesion segmentation sees a mixture of approaches used in object detection and organ segmentation. Developments in these two areas will most likely naturally propagate to lesion segmentation as the exist- ing challenges are also mostly similar.

2.4 Registration 配準

Registration (i.e. spatial alignment) of medical images is a common image analysis task in which a coordinate transform is calculated from one medical image to another. Often this is performed in an iterative framework where a specific type of (non-)parametric transformation is assumed and a predetermined metric is optimized.

Methods:
Researchers have found that deep networks can be beneficial in getting the best possible registration performance.
Broadly speaking, two strategies are prevalent in current literature:
(1) using deep-learning networks to estimate a similarity measure（相似性度量） for two images to drive an iterative optimization strategy（迭代優化策略）.
(2) to directly predict transformation parameters using deep regression networks.

Summary:
In contrast to classification and segmentation, the research community seems not have yet settled on the best way to integrate deep learning techniques in registration methods. Not many papers have yet appeared on the subject and existing ones each have a distinctly different approach.

2.5 Other tasks in medical imaging

2.5.1 Content-based image retrieval （基於內容的圖像檢索）

Content-based image retrieval (CBIR) is a technique
for knowledge discovery in massive databases and offers the possibility to identify similar case histories, understand rare disorders（罕見疾病）, and, ultimately, improve patient care.

Challenge:
The major challenge in the development of CBIR methods is extracting effective feature representations from the pixel-level information and associating them with meaningful concepts.

Methods:
All current approaches use (pre-trained) CNNs to extract feature descriptors from medical images.

Summary:
Content-based image retrieval as a whole has thus not seen many successful applications of deep learning methods yet, but given the results in other areas it seems only a matter of time.
An interesting avenue of research could be the direct training of deep networks for the retrieval task itself.

2.5.2 Image generation and enhancement （圖像生成和增強）

A variety of image generation and enhancement methods using deep architectures have been proposed, ranging from removing obstructing elements in images, normalizing images, improving image quality, data completion, and pattern discovery.

Methods:
In image generation, 2D or 3D CNNs are used to convert one input image into another. Typically these architectures lack the pooling layers present in classification networks.
With multi-stream CNNs super-resolution images can be generated from multiple low-resolution inputs.

Summary:
Image generation has seen impressive results with
very creative applications of deep networks in significantly differing tasks.

2.5.3 Combining image data with reports （將圖像數據與報告結合）

The combination of text reports and medical image data has led to two avenues of research:
(1) leveraging reports to improve image classification accuracy
(2) generating text reports from images.
the latter inspired by recent caption generation papers from natural images

Given the wealth of data that is available in PACS systems in terms of images and corresponding diagnostic reports, it seems like an ideal avenue for future deep learning research. One could expect that advances in captioning natural images will in time be applied to these data sets as well.

3.Application areas

We highlight some key contributions and discuss performance of systems on large data sets and on public challenge data sets.
All these challenges are listed on http:\\ www.grand-challenge.org

3.1 Brain 腦

DNNs have been extensively used for brain image
analysis in several different application domains. (Table 1)

[Application domins]
A large number of studies address classification of Alzheimer’s disease（阿茲海默病的分類） and segmentation of brain tissue and anatomical structures (e.g. the hippocampus)（腦組織和揭破結構（如海馬體）的分割）. Other important areas are detection and segmentation of lesions (e.g. tumors, white matter lesions, lacunes, micro-bleeds)（病變的檢測和分割（如腫瘤，白質病變，腔隙，微出血）).

Apart from the methods that aim for a scan-level classification (e.g. Alzheimer diagnosis), most methods learn mappings from local patches to representations and subsequently from representations to labels.（局部斑塊到表示，表示到標籤的映射）
[Problem]
However, the local patches might lack the contextual information required for tasks where anatomical information is paramount.（局部斑塊缺少上下文信息）

[Sulution]
To tackle this, Ghafoorian et al. (2016b) used non-uniformly sampled patches by gradually lowering sampling rate in patch sides to span a larger context. An alternative strategy used by many groups is multiscale analysis and a fusion of representations in a fullyconnected layer.

[Methods]
Even though brain images are 3D volumes in all surveyed studies, most methods work in 2D, analyzing the 3D volumes slice-by-slice. This is often motivated by either the reduced computational requirements or the thick slices relative to in-plane resolution in some data sets. More recent publications had also employed 3D networks.

[Summary]
BRATS
LSLES
MRBrains
…
the top ranking teams to date have all used CNNs.

Almost all of the aforementioned methods are concentrating on brain MR images. We expect that other brain imaging modalities such as CT and US can also benefit from deep learning based analysis.

3.2 Eye 眼睛

Ophtahlmic imaging（眼科成像）

Most works employ simple CNNs for the analysis of color fundus imaging (CFI).

A wide variety of applications are addressed: segmentation of anatomical structures（解剖結構的分割）, segmentation and detection of retinal abnormalities（視網膜異常的分割和檢測）, diagnosis of eye diseases（眼科疾病的診斷）, and image quality assessment（圖像質量評估）.

Kaggle: a diabetic retinopathy detection competition（糖尿病視網膜病變檢測）: Over 35,000 color fundus images(CFI) were provided to train algorithms to predict the severity of disease in 53,000 test images.
The majority of teams use end-to-end CNNs.

3.3 Chest 胸部

In thoracic image analysis（胸部圖像分析） of both radiography（X光） and computed tomography（CT 計算機斷層掃描）, the detection, characterization, and classification of nodules（結節的檢測、表徵、分類） is the most commonly addressed application.

In chest X-ray, several groups detect multiple diseases with a single system.
In CT the detection of textural patterns indicative of interstitial lung diseases is also a popular research topic.

challenge for nodule detection in CT, LUNA16: CNN architectures were used by all top performing systems.
The best systems in LUNA16 still rely on nodule candidates computed by rule-based image processing, but systems that use deep networks for candidate detection also performed very well (e.g. U-net).

Kaggle Data Science Bowl 2017: Estimating the probability that an individual has lung cancer from a CT scan

3.4 Digital pathology and microscopy 數字病理學和顯微鏡

3.5 Breast 乳房

3.6 Cardiac 心臟

Deep learning has been applied to many aspects of cardiac image analysis.

[Domains]
MRI is the most researched modality and left ventricle segmentation the most common task.
Other application domains: segmentation, tracking（追蹤）, slice classification（切片分類）, image quality assessment, automated calcium scoring（自動鈣評分） and coronary centerline tracking（冠狀動脈中心線追蹤）, and super-resolution（超級分辨率）.

[Methods]
(1)Most papers used simple 2D CNNs and analyzed the 3D and often 4D data slice by slice.
(2)the exception is Wolterink et al. (2016) where 3D CNNs were used.
(3)DBNs（Deep Belief Nets 深度信念網絡） are used in four papers, but these all originated from the same author group.The DBNs are only used for feature extraction and are integrated in compound segmentation frameworks.
(4)Two papers combined CNNs with RNNs.

[Challenge]
Kaggle Data Science Bowl2015: automatically measure end-systolic and end-diastolic volumes（心臟收縮末期和心臟舒張末期的容量） in cardiac MRI.

3.7 Abdomen 腹部

3.8 Musculoskeletal 肌與骨骼的

3.9 Other

4.Dissusion

4.1 Overview

(1) The earliest studies used pre-trained CNNs as feature extractors.
(2) In the last two years, end-to-end trained CNNs have become the preferred approach for medical imaging interpretation(the current standard practice).

4.2 Key aspects of successful deep learning methods

Although CNN(and derivatives) are now clearly the top performers in most medical image analysis competitions, the exact architecture is not the most important determinant in getting a good solution.

(1)Expert knowledge about the task to be solved can provide advantages that go beyond adding more layers to a CNN.(e.g. novel data preprocessing or augmentation techniques.)
(2)Designing architectures incorporating unique task-specific properties can obtain better results than straightforward CNNs. (e.g. multi-view and multi-scale networks).
Other, parts of network design are the network input size and receptive field(網絡輸入大小和接收場) (i.e. the area in input space that contributes to a single output unit (在輸入空間中有助於單個輸出單元的區域)).
(3)Model hyper-parameter optimization (e.g. learning rate, dropout rate)(a highly empirical exercise)(secondary importance with respect to performance to the previously discussed topics and training data quality.)
solutions:intuition-based random search(基於直覺的隨機搜索)(work well enough), Bayesian methods for hyper-parameter optimization(not been applied in medical image analysis)

4.3 Unique challenges in medical image analysis

(1) The lack of large training data sets is often mentioned as an obstacle.
The main challenge is thus not the availability of image data itself, but the acquisition of relevant annotations/labeling for these images.

Turning reports into accurate annotations or structured labels in an automated manner requires sophisticated text-mining methods, which is an important field of study in itself where deep learning is also widely used nowadays.

[Solutions]: training a deep learning segmentation system for 3D segmentation using only sparse 2D segmentations; Multiple-instance or active learning approaches; leveraging non-expert labels via crowd-sourcing(衆包); to highlight regions of interest, reducing the need for expert experience(e.g. in histopathology one can sometimes use specific immunohistochemical stains 特異性免疫組織化學染色)

(2) Label noise
(no consensus was forced)
Training a deep learning system on such data requires careful consideration of how to deal with noise and un- certainty in the reference standard.
[Solutions]: incorporating labeling uncertainty directly in the loss function(an open challenge)

(3) In medical imaging often classification or segmentation is presented as a binary task(二分類): normal versus abnormal(正常vs異常), object versus background(對象vs背景). However, this is often a gross simplification(粗糙的簡化) as both classes can be highly heterogeneous.
[Solutions]:
Turning the deep learning system in a multi-class system by providing it with detailed annotations of all possible subclasses(simply not feasible); Tackling this imbalance by incorporating intelligence in the training process itself, by applying selective sampling or hard negative mining(fail when there is substantial noise in the reference standard)

(4) Class imbalance
In medical imaging, images for the abnormal class might be challenging to find.
[Solutions]: Applying specific data augmentation algorithms.

(5) Physicians often leverage a wealth of data on patient history, age, demographics and others to arrive at better decisions. Some authors have already investigated combining this information into deep learning networks in a straightforward manner. The improvements that were obtained were not as large as expected.
One of the challenges is to balance the number of imaging features in the deep learning network (typically thousands) with the number of clinical features (typically only a handful) to prevent the clinical features from being drowned out.
…

4.4 Outlook

(1) Several high-profile successes of deep learning in medical imaging have been reported, such as the work by Esteva et al. (2017) and Gulshan et al. (2016) in the fields of dermatology(皮膚病學) and ophthalmology(眼科學).
However, ①both focus on small 2D color image classification; ②And it also allowed the authors to use networks that were pre-trained on a very well-labeled dataset.
In contrast, ①in most medical imaging tasks 3D gray-scale(3D灰度圖) or multi-channel images(多通道圖像) are used for which pre-trained networks or architectures don’t exist(不存在預訓練的網絡或架構); ②This data typically has very specific challenges, like anisotropic voxel sizes(各向異性體素尺寸), small registration errors between varying channels (e.g. in multi-parametric MRI) or varying intensity ranges(不同的通道或不同的強度範圍的小配準誤差); ③Although many tasks in medical image analysis can be postulated as a classification problem, this might not always be the optimal strategy as it typically requires some form of post-processing with non-deep learning methods

(2) A key area which can be highly relevant for medical imaging and is receiving (renewed) interest: unsupervised learning.

Unsupervised methods are attractive as ①they allow (initial) network training with the wealth of unlabeled data available in the world, ②analogue to human learning.

Unsupervised strategies: ①variational auto-encoders (VAEs 變分自動編碼器) ②generative adversarial networks (GANs 生成對抗網絡)
The former merges variational Bayesian graphical models(變分貝葉斯圖形模型) with neural networks as encoders/decoders. The latter uses two competing convolutional neural networks where one is generating artificial data samples(生成人工樣本) and the other is discriminating artificial from real samples(將人工樣本和真實樣本區分). Both have stochastic components(隨機成分) and are generative networks. Most importantly, they can be trained end-to-end and learn representative features in a completely unsupervised manner.

(3) Deep learning methods have often been described as ‘black boxes’. It is often not enough to have a good prediction system. This system also has to be able to articulate itself(自我表達) in a certain way.

Several strategies have been developed to understand what intermediate layers of convolutional networks are responding to. ①deconvolution networks(反捲積網絡), ②guided back-propagation(引導反向傳播) or ③deep Taylor composition(深度泰勒組合), ④tie prediction to textual representations of the image (i.e. captioning) , ⑤combine Bayesian statistics with deep networks to obtain true network uncertainty estimates

(4)We also foresee deep learning approaches will be used for related tasks in medical imaging, mostly unexplored, such as image reconstruction(圖像重建) (Wang, 2016)