MyDLNote-Enhancement:[SR轉文] 2020CVPR : Meta-Transfer Learning for Zero-Shot Super-Resolution

2020CVPR : Meta-Transfer Learning for Zero-Shot Super-Resolution

[paper] : https://arxiv.org/pdf/2002.12213.pdf

[github] : https://github.com/JWSoh/MZSR

 


Abstract

Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a “bicubic” downsampled noise-free image from a highresolution (HR) one.

提出問題:傳統的監督 SISR 算法

1. 不能利用特定圖像中的內部信息;

2. 只適用於它們所監督的數據的特定條件。例如,低分辨率 (LR) 圖像是高分辨率 (HR) 圖像的“雙三次”下采樣無噪聲圖像。

To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time.

傳統的解決方法及問題:零樣本學習超分算法,需要成千上萬的梯度更新,即推理時間長。

In this paper, we present Meta-Transfer Learning for Zero-Shot SuperResolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.

本文的解決方法:Meta-Transfer Learning for ZSSR --> MZSR

準確地說,它是基於找到一個適合內部學習的通用初始參數。

因此,利用外部和內部信息,其中一個梯度更新可以產生相當可觀的結果。(如圖1所示)。

該網絡可以快速適應給定的圖像條件。該方法可以應用於大範圍圖像條件下的快速適應過程。

Figure 1: Super-resolved results (×2) of “img050” in Urban100 [14]. The blur kernel of the LR image is an isotropic Gaussian kernel with width 2.0. Result of (c) is fine-tuned from a pre-trained model. Our MZSR outperforms other methods within just one single gradient descent update.

從圖1中,可以知道本文的最大貢獻,是極大提高了零樣本學習SISR算法的更新速度。

[14] 2015CVPR Single image super-resolution from transformed self-exemplars


Introduction

照例,我們通過前言看故事。

SISR, which is to find a plausible HR image from its counterpart LR image, is a long-standing problem in lowl-evel vision area. Recently, the remarkable success of CNNs brought attention to the research community, and hence numerous CNN-based SISR methods have exhibited large performance leap [15, 17, 21, 47, 2, 45, 36, 20, 12, 13]. Most of the recent state-of-the-art (SotA) CNN-based methods are based on a large number of external training dataset and self-supervised settings with known degradation model, e.g., “bicubic” downsampling. Impressively, the recent SotA CNNs show significant PSNR gains compared to the conventional large size of models for the noise-free “bicubic” downsampling condition. However, in real-world situations, when the LR image has distant statistics in downsampling kernels and noises, the recent methods produce undesirable artifacts and show inferior results due to the domain gap. Moreover, their number of parameters and memory overheads are usually too large to be used in real applications.

第一段,講了傳統自監督深度學習 SISR 問題的不足:不適合真實 LR 圖像的超分辨;參數量過高。

可以隱約得知,本文提出的方法適合真實 LR 圖像超分辨,同時參數、內存不是很大。

 

Besides, non-local self-similarity in scale and across multi-scale, which is the internal recurrence of information within a single image, is one of the strong natural image priors. Therefore it has long been used in image restoration tasks, including image denoising [5, 6] and superresolution [24, 14]. Additionally, the powerful image prior of non-local property is embedded into network architecture [19, 22, 46] by implicitly learning such priors to boost the performance of the networks further. Also, some works to learn internal distribution have been proposed [34, 32, 33]. Moreover, there have been many studies to combine the advantages of external and internal information for image restoration [26, 43, 42, 41].

第二段講了一個自然圖像中強有力的先驗:在尺度和跨多尺度上的非局部自相似,即信息在單一圖像內部是會重複出現的。

並且,用先前的工作來說明,該先驗知識對於圖像處理任務的有效性,具體工作包括:

1. denoising [5, 6] and superresolution [24, 14];

2. 網絡框架已經開始使用非局部先驗 [19, 22, 46];

3. internal distribution [34, 32, 33].;

4. 結合 external and internal information [26, 43, 42, 41]。

寫這段的原因是 ZSSR 是建立在該先驗知識上的。

Recently, ZSSR [34] has been proposed for zero-shot super-resolution, which is based on the zero-shot setting to exploit the power of CNN but can be easily adapted to the test image condition. Interestingly, ZSSR learns the internal non-local structure of the test image, i.e., deep internal learning. Thus it outperforms external-based CNNs in some regions where the recurrences are salient. Also, ZSSR is highly flexible that it can address any blur kernels, and thus easily adapted to the conditions of test images.

第三段,本文是建立在 ZSSR 之上的,所以要介紹一下這個網絡的優點

一種用零樣本學習的 SISR 方法,解決了傳統方法只訓練了單一圖像條件下的超分辨。

該方法可以靈活地用於任何 blur kernels。

該方法主要是在那些重複出現的區域中表現出明顯的效果。

[34] 2018CVPR “Zero-Shot” Super-Resolution using Deep Internal Learning

However, ZSSR has a few limitations. First, it requires thousands of backpropagation gradient updates at test time, which requires considerable time to get the result. Also, it cannot fully exploit the large-scale external dataset, and rather it depends only on internal structure and patterns, which lacks in the number of total examples. Eventually, this leads to inferior results in most of the regions with general patterns compared to the external-based methods.

第四段,既然是對 ZSSR 改進,那麼 ZSSR 一定是有缺點的。表現在:

1. 需要在測試時進行數千次反向傳播梯度更新,這需要相當長的時間才能得到結果;

2. 不能充分利用大規模的外部數據集,相反,它只依賴於內部結構和模式,而這種結構和模式在實例總數上是不足的;

 

On the other hand, meta-learning or learning to learn fast has recently attracted many researchers. Meta-learning aims to address a problem that artificial intelligence is hard to learn new concepts quickly with a few examples, unlike human intelligence. In this respect, meta-learning is jointly merged with few-shot learning, and many methods with this approach have been proposed [35, 39, 38, 28, 25, 8, 10, 18, 37]. Among them, Model-Agnostic Meta-Learning (MAML) [8] has shown great impact, showing SotA performance by learning the optimal initial state of the model such that the base-learner can fast adapt to a new task within a few gradient steps. MAML employs the gradient update as meta-learner, and the same author analyzed that gradient descent can approximate any learning algorithm [9]. Moreover, Sun et al. [37] have jointly utilized MAML with transfer learning to exploit large-scale data for few-shot learning.

第五段,Meta-learning 介紹。本文的工作是將 Meta-learning 和 ZSSR 結合,所以要介紹一下這個知識點。

Meta-learning 旨在解決人工智能難以通過少量例子而快速學習新概念的問題。因此,Meta-learning 與少樣本學習是相結合的,如 [35, 39, 38, 28, 25, 8, 10, 18, 37] 這麼多工作。

其中,MAML 表現不俗。這應該是本文要用的方法了。

個人對這些知識完全不懂,所以,需要的話,要看看文獻 [8] 或者這麼多 [35, 39, 38, 28, 25, 8, 10, 18, 37]。

[8] 2017ICLR Model-agnostic meta-learning for fast adaptation of deep networks

1. Inspired by the above-stated works and ZSSR, we present Meta-Transfer Learning for Zero-Shot SuperResolution (MZSR), which is kernel-agnostic. We found that simply employing transfer learning or fine-tuning from a pre-trained network does not yield plausible results.

2. As ZSSR only has a meta-test step, we additionally adopt a meta-training step to make the model adapt fast to new blur kernel scenarios. Additionally, we adopt transfer learning in advance to fully utilize external samples, further leveraging the performance.

3. In particular, transfer learning with the help of a large-scale synthetic dataset (“bicubic” degradation setting) is first performed for the external learning of natural image priors.

Then, meta-learning plays a role in learning task-level knowledge with different downsampling kernels as different tasks.

At the meta-test step, simple self-supervised learning is conducted to learn image-specific information within a few gradient steps.

4. As a result, we can exploit both external and internal information.

Also, by leveraging the advantages of ZSSR, we may use a lightweight network, which is flexible to different degradation conditions of LR images.

Furthermore, our method is much faster than ZSSR, i.e., it quickly adapts to new tasks within a few gradient steps, while ZSSR requires thousands of updates.

第六段,終於講本文的工作了(前面涉及的東西是有點多)。這段有點長,我分解成了幾個子段,便於理清思路:

1. 點題:並解釋了用簡單地使用遷移學習或微調,效果不明顯。

2. 本文的核心策略:

1)ZSSR 只在測試時使用了 meta-test step;本文在訓練時增加使用了 meta-training step ,能夠使模型快速適應新的模糊內核場景;

2)採用遷移學習,爲了充分利用外部示例。

3. 具體實施方法:

1)首先利用大型合成數據集 (“雙三次”退化設置) 進行遷移學習,以進行自然圖像先驗的外部學習;

2)然後,meta-learning 在學習任務級知識(learning task-level knowledge)中發揮作用,不同的下采樣內核作爲不同的任務。

3)在 meta-test 步驟中,進行簡單的自我監督學習,在幾個梯度步驟內學習特定於圖像的信息。因此,本文的方法可以利用外部和內部信息。

4. 本文方法的優點:

1)可以利用外部和內部信息;

2)通過利用 ZSSR 的優點,本文的方法是一個輕量級的網絡;

3)通過利用 ZSSR 的優點,可以靈活地適應 LR 圖像的不同退化條件。

4)本文的方法比 ZSSR 快得多。當 ZSSR 需要數上千次更新時,本文可以在幾個漸變步驟內快速適應新任務。

 

In summary, our overall contribution is three-fold:

• We present a novel training scheme based on metatransfer learning, which learns an effective initial weight for fast adaptation to new tasks with the zeroshot unsupervised setting.

• By using external and internal samples, it is possible to leverage the advantages of both internal and external learning.

• Our method is fast, flexible, lightweight and unsupervised at meta-test time, hence, eventually can be applied to real-world scenarios.

最後一段,總結本文貢獻。

 

Related Work

CNN-based Super-Resolution

SISR is based on the image degradation model as

I^k_{LR} = (I_{HR} \ast k) \downarrow_s +n,                    (1)

where I_{HR}, I^k_{LR}, k, \ast, \downarrow_s, and n denote HR, LR image, blur kernel, convolution, decimation with scaling factor of s, and white Gaussian noise, respectively. It is notable that diverse degraded conditions can be found in real-world scenes, with various unknown k, \downarrow_s, and n.

本段呢,就是告訴讀者本文的數學符號表達方法。

 

Meta-Learning

In recent years, diverse meta-learning algorithms have been proposed. They can be categorized into three groups. The first group is metric based methods [35, 38, 39], which is to learn metric space in which learning is efficient within a few samples. The second group is memory networkbased methods [31, 28, 25], where the network learns across task knowledges and well generalizes to unseen tasks. The last group is optimization based methods, where gradient descent plays a role as a meta-learner optimization [10, 18, 9, 8]. Among them, MAML [8] has shown a great impact on the research community, and several variants have been proposed [27, 37, 3, 30]. MAML inherently requires second-order derivative terms, and the first-order algorithm has also been proposed in [27]. Also, to cope with the instability of MAML training, MAML++ [3] has been proposed. Moreover, MAML within embedded space has been proposed [30]. In this paper, we employ MAML scheme for fast adaptation of zero-shot super-resolution.

介紹 meta-learning。我也是第一次看到這個東西,如果可以的話,可以作爲知識積累學習一下。

 


Preliminary

We introduce self-supervised zero-shot super-resolution and meta-learning schemes with notations, following related works [34, 8].

本文引入了具有符號的自監督零鏡頭超分辨率和元學習方案

  • Zero-Shot Super-Resolution

ZSSR [34] is totally unsupervised or self-supervised. Two phases of training and test are both held in runtime. In training phase, the test image I_{LR } is down-sampled with desired kernel to generate “LR son” denoted as I_{son}, and I_{LR } becomes the HR supervision, “HR father.” Then, the CNN is trained with the LR-HR pairs generated by a single image. The training solely depends on the test image, thus learns specific internal information to given image statistics. In the test phase, the trained CNN then works as a feed-forward network, and the test input image is fed to the CNN to get the super-resolved image I_{SR}.

  •  

本章是詳細介紹本文算法的兩個基礎(本文的工作就是二者的結合稍加修改):ZSSR 和 Meta-Learning。

ZSSR :

ZSSR的核心思想是將給定的 LR 圖像 I_{LR } 降採樣,得到 LR 的兒子 I_{son},此時,LR 做了爸爸,即 I_{son} 的 ground truth。通過這一對圖像進行訓練得到的模型,直接對原 LR 圖像 I_{LR } 進行超分辨,得到 I_{SR}

  • Meta-Learning

Meta-learning has two phases: meta-training and meta-test. We consider a model f_{\theta }(\cdot), which is parameterized by \theta, that maps inputs x to outputs y. The goal of meta-training is to make the model to be able to adapt to a large number of different tasks. A task \mathcal{T}_i is sampled from a task distribution p(\mathcal{T} ) for meta-training. Within a task, training samples are used to optimize the base-learner with a task-specific loss \mathcal{L}_{\mathcal{T}_i} and test samples are used to optimize the meta-learner.

In meta-test phase, the model f_{\theta }(\cdot) quickly adapts to a new task \mathcal{T}_{new} with the help of meta-learner. MAML [8] employs a simple gradient descent algorithm as the meta-learner and seeks to find an initial transferable point where a few gradient updates lead to a fast adaptation of the model to a new task.

In our case, the input x and the output y are I_{LR}^k and I_{SR}. Also, diverse blur kernels constitute the task distribution, where each task corresponds to the super-resolution of an image degraded by a specific blur kernel.

Meta-Learning:

Meta 學習有兩個階段:meta 培訓和 meta 測試。

考慮一個模型 f_{\theta }(\cdot),它是 \theta 參數化的輸入 x 到輸出 y 的映射。meta 訓練的目標是使模型能夠適應大量不同的任務。任務 \mathcal{T}_i 從任務分佈 p(\mathcal{T} ) 中取樣進行 meta 訓練。在一個任務中,訓練樣本用於優化具有特定任務損失 \mathcal{L}_{\mathcal{T}_i} 的 meta-learner,測試樣本用於優化元學習者。

在 meta 測試階段,在 meta 學習者的幫助下,模型 f_{\theta }(\cdot) 能夠快速適應新的任務 \mathcal{T}_{new}。MAML [8] 使用一個簡單的梯度下降算法作爲 meta-learner,並試圖找到一個初始的可遷移點,其中一些梯度更新導致模型快速適應新的任務。

在本文的例子中,輸入和輸出是 I_{LR}^k 和 I_{SR}。此外,不同的模糊內核構成了任務分佈,其中每個任務對應於由特定模糊內核降級的圖像的超分辨率。

Figure 2: The overall scheme of our proposed MZSR. During meta-transfer learning, the external dataset is used, where internal learning is done during meta-test time. From random initial point \theta _0, large-scale dataset DIV2K [1] with “bicubic” degradation is exploited to obtain \theta _T. Then, meta-transfer learning learns a good representation \theta _M for super-resolution tasks with diverse blur kernel scenarios. The figure shows N tasks for simplicity. In the meta-test phase, self-supervision within a test image is exploited to train the model with corresponding blur kernel.

 


Method

The overall scheme of our proposed MZSR is shown in Figure 2. As shown, our method consists of three steps: large-scale training, meta-transfer learning, and meta-test.

方法包括三個步驟:大規模訓練、meta 遷移學習和 meta 測試。

Large-scale Training

This step is similar to the large-scale ImageNet [7] pretraining for object recognition. In our case, we adopt DIV2K [1] which is a high-quality dataset DHR. Using known “bicubic” degradation, we first synthesized large number of paired dataset (I_{HR}, I^{bic}_{LR}), denoted as D. Then, we trained the network to learn super-resolution of “bicubic” degradation model by minimizing the loss,

which is the pixel-wise L1 loss [21, 34] between prediction and the ground-truth.

The large-scale training has contributions within two respects. First, as super-resolution tasks share similar properties, it is possible to learn efficient representations that implicitly represent natural image priors of high-resolution images, thus making the network ease to be learned. Second, as MAML [8] is known to show some unstable training, we ease the training phase of meta-learning with the help of well pre-trained feature representations.

大規模訓練其實就是在已有的 paired  數據集上進行預訓練。數據集選擇 DIV2K [1] ,損失函數選擇 L1 Loss。

大規模培訓作用表現在兩個方面:

首先,由於超分辨率任務具有相似的屬性,因此可以學習隱式地表示高分辨率圖像的自然圖像先驗的有效表示,從而使網絡易於學習;

其次,由於 MAML (Model-Agnostic Meta-Learning (MAML) [8] )顯示了一些不穩定的訓練,通過良好的預先訓練的特徵表示來簡化元學習的訓練階段。

 

Meta-Transfer Learning

Since ZSSR is trained with the gradient descent algorithm, it is possible to introduce an optimization-based meta-training step with the help of gradient descent algorithm, which is proven to be a universal learning algorithm [9].

In this step, we seek to find a sensitive and transferable initial point of the parameter space where a few gradient updates lead to large performance improvements. Inspired by MAML, our algorithm mostly follows MAML but with several modifications.

Unlike MAML, we adopt different settings for metatraining and meta-test. In particular, we use the external dataset for meta-training, whereas internal learning is adopted for meta-test. This is because we intend our meta-learner to more focus on the kernel-agnostic property with the help of a large-scale external dataset.

這三段呢,就是說明了一下本文提出的 meta 訓練的根本動機

由於 ZSSR 是使用梯度下降算法進行訓練的,因此可以藉助梯度下降算法引入基於優化的 meta 訓練步驟,該步驟已被證明是一種通用的學習算法[9]。

在這一步,試圖找到一個敏感的和可遷移的參數空間的初始點,其中少量梯度更新即可導致較大的性能改善。受 MAML 的啓發,本文的算法主要遵循 MAML,但是做了一些修改。

與MAML不同的是,本文的 meta 訓練和 meta 測試採用了不同的設置。特別地,使用外部數據集進行 meta 訓練,而使用內部學習進行 meta 測試。這是因爲本文的根本目標是讓 meta-learner 在大型外部數據集的幫助下更多地關注與內核無關的屬性

 

We synthesize dataset for meta-transfer learning, denoted as D_{meta}. D_{meta} consists of pairs, (I_{HR}, I^{bic}_{LR}), with diverse kernel settings. Specifically, we used isotropic and anisotropic Gaussian kernels for the blur kernels. We consider a kernel distribution p(k), where each kernel is determined by a covariance matrix \Sigma. it is chosen to have a random angle \Theta \sim U[0, \pi ], and two random eigenvalues \lambda_1 \sim U[1, 2.5s], \lambda_2 \sim U[1, \lambda_1] where s denotes the scaling factor. Precisely, the covariance matrix is expressed as

Eventually, we train our meta-learner based on D_{meta}. We may divide D_{meta} into two groups: D_{tr} for task-level training, and D_{te} for task-level test.

這一段介紹了用於 meta 遷移學習的數據集 D_{meta} 是怎麼生成的。

1. 卷積核選擇 各向同性和各向異性高斯核;

2. 核分佈,每個核由協方差矩陣決定,如公式(3);其中,隨機角度 \Theta \sim U[0, \pi ];兩個隨機特徵值  \lambda_1 \sim U[1, 2.5s], \lambda_2 \sim U[1, \lambda_1];比例因子 s

3. 分爲訓練數據集 D_{tr} 和測試數據集 D_{te}

 

In our method, adaptation to a new task \mathcal{T}_i with respect to the parameters \theta is one or more gradient descent updates. For one gradient update, new adapted parameters \theta _i is then

where \alpha is the task-level learning rate. The model parameters \theta are optimized to achieve minimal test error of D_{meta} with respect to \theta _i. Concretely, the meta-objective is

Meta-transfer optimization is performed using Eq. 6, which is to learn the knowledge across task. Any gradientbased optimization can be used for meta-transfer training. For stochastic gradient descents, the parameter update rule is expressed as

where \beta is the meta-learning rate.

在我們的方法中,適應新任務 \mathcal{T}_i 的參數 \theta 是一個或多個梯度下降法更新。對於一個梯度更新,新調整參數 \theta _i 爲公式(4)。

對模型參數 \theta 進行優化,使 D_{meta} 試驗誤差達到最小。具體來說,meta 目標是公式(6)。

使用公式(6)進行 meta 遷移優化,即跨任務學習知識。任何基於梯度的優化都可以用於 meta 遷移訓練。對於隨機梯度下降,參數更新規則表示爲公式(7)。

 

Meta-Test

The meta-test step is exactly the zero-shot superresolution. As evidence in [34], this step enables our model to learn internal information within a single image. With a given LR image, we downsample it with corresponding downsampling kernel (kernel estimation algorithms [24, 29] can be adopted for blind scenario) to generate I_{son} and perform a few gradient updates with respect to the model parameter using a single pair of “LR son” and a given image. Then, we feed a given LR image to the model to get a superresolved image.

meta 測試步驟正是零樣本超分辨率。這一步使本文的模型能夠學習單個圖像中的內部信息。對於給定的 LR 圖像,使用相應的下采樣內核對其進行下采樣(盲場景可以採用內核估計算法[24,29]),生成 I_{son},並使用一對 “LR son” 和給定的圖像對模型參數進行一些梯度更新。然後,將給定的 LR 圖像提供給模型以獲得超分辨圖像。

 

Algorithm

Algorithm 1 demonstrates the process of our metatransfer training procedures of Section 4.1 and 4.2. Lines 3-7 is the large-scale training stage. Lines 11-14 is the inner loop of meta-transfer learning where the base-learner is updated to task-specific loss. Lines 15-16 presents the meta-learner optimization.

Algorithm 2 presents the meta-test step, which is the zero-shot super-resolution. A few gradient updates (n) are performed while meta-test, and the super-resolved image is obtained with final updated parameters.

算法1: meta 訓練步驟

第3-7行是大規模的訓練階段。

第11-14行是 meta 遷移學習的內部循環。

第15-16行展示了 meta-learner 優化。

算法2:meta 測試步驟,即零樣本超分辨

在 meta 測試過程中進行少量梯度更新(n),得到最終更新參數的超分辨率圖像。

 

 

(後續算法步驟需要再詳細研究分析,實驗部分後續補充。。。)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章