2020CVPR -- Deep Unfolding Network for Image Super-Resolution

[paper] : Deep Unfolding Network for Image Super-Resolution

[github] : https://github.com/cszn/USRNet

Abstract

Learning-based single image super-resolution (SISR) methods are continuously showing superior effectiveness and efficiency over traditional model-based methods, largely due to the end-to-end training. However, different from model-based methods that can handle the SISR problem with different scale factors, blur kernels and noise levels under a unified MAP (maximum a posteriori) framework, learning-based methods generally lack such flexibility.

提出問題：

首先，肯定了深度學習 SISR 方法的優越表現；然後，指出基於深度學習 SISR 的問題，即：

基於模型的（非深度學習的）SISR 方法在同一標準的最大後驗概率（maximum a posteriori，MAP）下，可以對不同的尺度因素，模糊內核和噪音水平的 LR 圖像進行研究處理；但基於深度學習的 SISR 缺乏這種靈活度。

To address this issue, this paper proposes an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods. Specifically, by unfolding the MAP inference via a half-quadratic splitting algorithm, a fixed number of iterations consisting of alternately solving a data subproblem and a prior subproblem can be obtained. The two subproblems then can be solved with neural modules, resulting in an end-to-end trainable, iterative network. As a result, the proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model, while maintaining the advantages of learning-based methods.

解決問題的方法：

爲了解決這一問題，本文提出了的端到端可訓練展開網絡，同時利用了基於學習的方法和基於模型的方法。

具體地，通過半二次分裂算法（half-quadratic splitting algorithm）展開 MAP 推理，可以得到由交替求解一個數據子問題和一個先驗子問題組成的固定迭代次數。

這兩個子問題可以用神經模塊來解決，從而形成一個端到端可訓練的迭代網絡。

因此，該網絡繼承了基於模型的方法的靈活性，可以通過單一模型對不同尺度因子的模糊、噪聲圖像進行超分辨，同時保持了基於學習的方法的優點。

Extensive experiments demonstrate the superiority of the proposed deep unfolding network in terms of flexibility, effectiveness and also generalizability.

實驗結果。

Introduction

讓我們看看，Introduction 講了一個什麼故事。

Single image super-resolution (SISR) refers to the process of recovering the natural and sharp detailed highresolution (HR) counterpart from a low-resolution (LR) image. It is one of the classical ill-posed inverse problems in low-level computer vision and has a wide range of realworld applications, such as enhancing the image visual quality on high-definition displays [42, 53] and improving the performance of other high-level vision tasks [13].

籠統介紹了 SISR 是幹嘛的。

Despite decades of studies, SISR still requires further study for academic and industrial purposes [35, 64]. The difficulty is mainly caused by the inconsistency between the simplistic degradation assumption of existing SISR methods and the complex degradations of real images [16]. Actually, for a scale factor of , the classical (traditional) degradation model of SISR [17, 18, 37] assumes the LR image is a blurred, decimated, and noisy version of an HR image . Mathematically, it can be expressed by

$y= (x \otimes k)\downarrow_s +n$ , (1)

where ⊗ represents two-dimensional convolution of with blur kernel , $\downarrow_s$ denotes the standard s-fold downsampler, i.e., keeping the upper-left pixel for each distinct $s\times s$ patch and discarding the others, and is usually assumed to be additive, white Gaussian noise (AWGN) specified by standard deviation (or noise level) $\sigma$ [71]. With a clear physical meaning, Eq. (1) can approximate a variety of LR images by setting proper blur kernels, scale factors and noises for an underlying HR images. In particular, Eq. (1) has been extensively studied in model-based methods which solve a combination of a data term and a prior term under the MAP framework.

繼續詳細介紹 SISR，從數學模型角度：學術研究和工業應用之間的差異，主要是由於現有SISR方法的退化假設過於簡單，與真實圖像的複雜退化不一致造成的。後面，就是具體介紹這個問題的數學含義。對於給定的HR圖像，LR 圖像由模糊內核、比例因子（降維比例）和噪聲決定。

Eq.(1) 在基於模型的方法中得到了廣泛的研究，這種方法解決了MAP框架下數據項和先驗項的組合。

在 Eq.(1) 基礎上，將引出下一段關於基於模型和基於深度學習 SISR 算法的問題。

Though model-based methods are usually algorithmically interpretable, they typically lack a standard criterion for their evaluation because, apart from the scale factor, Eq. (1) additionally involves a blur kernel and added noise. For convenience, researchers resort to bicubic degradation without consideration of blur kernel and noise level [14,56, 60]. However, bicubic degradation is mathematically complicated [25], which in turn hinders the development of model-based methods.

For this reason, recently proposed SISR solutions are dominated by learning-based methods that learn a mapping function from a bicubicly downsampled LR image to its HR estimation. Indeed, significant progress on improving PSNR [26, 70] and perceptual quality [31, 47, 58] for the bicubic degradation has been achieved by learning-based methods, among which convolutional neural network (CNN) based methods are the most popular, due to their powerful learning capacity and the speed of parallel computing. Nevertheless, little work has been done on applying CNNs to tackle Eq. (1) via a single model. Unlike model-based methods, CNNs usually lack flexibility to super-resolve blurry, noisy LR images for different scale factors via a single end-to-end trained model (see Fig. 1).

Figure 1. While a single degradation model (i.e., Eq. (1)) can result in various LR images for an HR image, with different blur kernels, scale factors and noise, the study of learning a single deep model to invert all such LR images to HR image is still lacking.

問題提出：

目前的 SISR 無非兩類，基於模型的（非深度學習）和基於深度學習的。這兩類方法都有各自的問題。

基於模型的問題：採用雙三次退化而不考慮模糊核和噪聲水平[14,56,60]。然而，雙三次退化在數學上是複雜的[25]，這反過來又阻礙了基於模型的方法的發展。

基於深度學習的：CNNs通常缺乏靈活性，無法通過單一的端到端訓練模型，針對不同的尺度因子來實現模糊、含有噪聲的 LR 圖像的超分辨。

問題指明瞭工作內容，下一段講本文的工作了。

In this paper, we propose a deep unfolding super-resolution network (USRNet) to bridge the gap between learning-based methods and model-based methods. On one hand, similar to model-based methods, USRNet can effectively handle the classical degradation model (i.e., Eq. (1)) with different blur kernels, scale factors and noise levels via a single model. On the other hand, similar to learning-based methods, USRNet can be trained in an end-to-end fashion to guarantee effectiveness and efficiency.

To achieve this, we first unfold the model-based energy function via a halfquadratic splitting algorithm. Correspondingly, we can obtain an inference which iteratively alternates between solving two subproblems, one related to a data term and the other to a prior term. We then treat the inference as a deep network, by replacing the solutions to the two subproblems with neural modules.

Since the two subproblems correspond respectively to enforcing degradation consistency knowledge and guaranteeing denoiser prior knowledge, USRNet is well-principled with explicit degradation and prior constraints, which is a distinctive advantage over existing learning-based SISR methods.

It is worth noting that since USRNet involves a hyper-parameter for each subproblem, the network contains an additional module for hyper-parameter generation. Moreover, in order to reduce the number of parameters, all the prior modules share the same architecture and same parameters.

本文工作：

在介紹本文工作時，作者的思路是這樣的：

1. 先介紹本文的核心思想：結合基於學習的方法和基於模型的方法。

一方面，與基於模型的方法類似，USRNet 可以有效地處理經典的退化模型(即，式(1)))通過單一模型，具有不同的模糊核、尺度因子和噪聲水平。

另一方面，與基於學習的方法類似，USRNet 可以以端到端的方式進行培訓，以保證有效性和效率。

2. 核心思想是怎麼實現的：

首先，通過半二次分裂算法展開基於模型的能量函數。相應地，可以得到一個推理，它迭代地交替解決兩個子問題，一個與數據項有關，另一個與先驗項有關。

然後，將推理作爲一個深層網絡，用神經模塊代替這兩個子問題的解。

3. 解釋提出的網絡：

由於這兩個子問題分別對應於強化降解一致性知識（enforcing degradation consistency knowledge）和保證去噪器先驗知識（guaranteeing denoiser prior knowledge），所以 USRNe t對於顯式退化和先驗約束是很有原則的，這是與現有的基於學習的SISR 方法的顯著優勢。

值得注意的是，由於 USRNet 涉及到每個子問題的超參數，因此網絡包含一個額外的模塊用於生成超參數。

此外，爲了減少參數的數量，所有之前的模塊共享相同的架構和相同的參數。

（當然，只讀到這裏呢，完全不懂這幾句在說什麼，看看正文吧，或許能看明白。）

The main contributions of this work are as follows:

1) An end-to-end trainable unfolding super-resolution network (USRNet) is proposed. USRNet is the first attempt to handle the classical degradation model with different scale factors, blur kernels and noise levels via a single end-to-end trained model.

2) USRNet integrates the flexibility of model-based methods and the advantages of learning-based methods, providing an avenue to bridge the gap between model-based and learning-based methods.

3) USRNet intrinsically imposes a degradation constraint (i.e., the estimated HR image should accord with the degradation process) and a prior constraint (i.e., the estimated HR image should have natural characteristics) on the solution.

4) USRNet performs favorably on LR images with different degradation settings, showing great potential for practical applications.

本文的主要貢獻：（不厚道地直接翻譯了）

1) 提出了一種端到端可訓練展開超分辨率網絡 (USRNet)。USRNet 是第一個嘗試處理經典退化模型與不同的尺度因子，模糊內核和噪聲水平通過一個單一的端到端訓練模型。

2) USRNet 融合了基於模型方法的靈活性和基於學習方法的優點，爲彌合基於模型方法和基於學習方法之間的鴻溝提供了途徑。

3) USRNet 本質上強加了一個退化約束(即，估計的 HR 圖像應符合退化過程) 和一個先驗約束 (即，估計的 HR 圖像應該具有自然特徵) 上的解決方案。

4) USRNet 在不同退化設置的 LR 圖像上表現良好，顯示出巨大的實際應用潛力。

Related work

略

Method

Degradation model: classical vs. bicubic

Since bicubic degradation is well-studied, it is interesting to investigate its relationship to the classical degradation model. Actually, the bicubic degradation can be approximated by setting a proper blur kernel in Eq. (1). To achieve this, we adopt the data-driven method to solve the following kernel estimation problem by minimizing the reconstruction error over a large HR/bicubic-LR pairs $\{(x, y)\}$

$k^{\times s}_{bicubic } = arg min_k \left \| (x \otimes k)\downarrow_s-y \right \| .$ (2)

Fig. 2 shows the approximated bicubic kernels for scale factors 2, 3 and 4. It should be noted that since the downsamlping operation selects the upper-left pixel for each distinct s × s patch, the bicubic kernels for scale factors 2, 3 and 4 have a center shift of 0.5, 1 and 1.5 pixels to the upper-left direction, respectively.

由於雙三次降解已經得到了很好的研究，因此研究其與經典降解模型的關係是很有趣的。實際上，可以通過在式 (1) 中設置合適的模糊核來近似雙三次退化。爲了實現這一點，採用數據驅動的方法來解決下面的核估計問題，即在一個較大的 HR/ bicubicr - LR 對 $\{(x, y)\}$ 上最小化重構誤差

$k^{\times s}_{bicubic } = arg min_k \left \| (x \otimes k)\downarrow_s-y \right \| .$ (2)

圖2顯示了比例因子2、3和4的近似雙三次核。需要注意的是，由於 downsamlping 操作爲每個不同的 $s\times s$ patch 選擇左上角像素，因此比例因子2、3和4的雙三次核分別向左上角偏移 0.5、1和1.5像素。

Figure 2. Approximated bicubic kernels for scale factors 2, 3 and 4 under the classical SISR degradation model assumption. Note that these kernels contain negative values.

Unfolding optimization

According to the MAP framework, the HR image could be estimated by minimizing the following energy function

where is the data term, $\Phi (x)$ is the prior term, and $\lambda$ is a trade-off parameter.

給出了 MAP 框架。

In order to obtain an unfolding inference for Eq. (3), the half-quadratic splitting (HQS) algorithm is selected due to its simplicity and fast convergence in many applications. HQS tackles Eq. (3) by introducing an auxiliary variable , leading to the following approximate equivalence

where $\mu$ is the penalty parameter.

給出了 HQS 算法處理 MAP 的形式。（對 HQS 完全不瞭解哦，有點像是，把第一項的 x 換成 z，再加入一個 L2(x, z) 的懲罰項，讓 z 與 x 儘可能相似。）

Such problem can be addressed by iteratively solving subproblems for and

According to Eq. (5), $\mu$ should be large enough so that and are approximately equal to the fixed point. However, this would also result in slow convergence. Therefore, a good rule of thumb is to iteratively increase $\mu$ . For convenience, the $\mu$ in the -th iteration is denoted by $\mu_k$ .

給出了具體解公式（4）的方法，即把（4）拆分從兩部分（5）和（6）。（5）用來解。（6）用來解。

其中， $\mu$ 應該足夠大，使得和近似等於不動點。然而，這也會導致緩慢的收斂。因此，一個好的經驗法則是迭代遞增 $\mu$ 。爲方便起見，第次迭代中的記爲 $\mu_k$ 。

（5）就是前面提到的數據項 data term。

（6）就是前面提到的先驗項 prior term。

It can be observed that the data term and the prior term are decoupled into Eq. (5) and Eq. (6), respectively.

For the solution of Eq. (5), the fast Fourier transform (FFT) can be utilized by assuming the convolution is carried out with circular boundary conditions. Notably, it has a closed-form expression [71]

where is defined as

with $\alpha_k \triangleq \mu_k\sigma ^2$ and where the $F(\cdot)$ and $F^{-1} (\cdot)$ denote FFT and inverse FFT, $\overline{F(\cdot)}$ denotes complex conjugate of $F(\cdot)$ , denotes the distinct block processing operator with element-wise multiplication, i.e., applying elementwise multiplication to the $s \times s$ distinct blocks of , $\Downarrow_s$ denotes the distinct block downsampler, i.e., averaging the $s \times s$ distinct blocks, $\uparrow_s$ denotes the standard s-fold upsampler, i.e., upsampling the spatial size by filling the new entries with zeros.

It is especially noteworthy that Eq. (7) also works for the special case of deblurring when .

For the solution of Eq. (6), it is known that, from a Bayesian perspective, it actually corresponds to a denoising problem with noise level $\beta _k \triangleq \sqrt{\lambda /\mu_k}$ [10].

給出了公式（5）和（6）的解。（公式（6）的解我沒看懂，或許要參考一下文獻 [10]。）

公式（5）的求解採用了快速傅里葉變換，求解過程可能需要參考一下文獻 [71]。

值得注意的是，當時，公式（7）也可以適用於去模糊算法（deblurring ）。

Deep unfolding network

Once the unfolding optimization is determined, the next step is to design the unfolding super-resolution network (USRNet). Because the unfolding optimization mainly consists of iteratively solving a data subproblem (i.e., Eq. (5)) and a prior subproblem (i.e., Eq. (6)), USRNet should alternate between a data module D and a prior module P. In addition, as the solutions of the subproblems also take the hyper-parameters αk and βk as input, respectively, a hyper-parameter module H is further introduced into USRNet. Fig. 3 illustrates the overall architecture of USRNet with K iterations, where K is empirically set to 8 for the speed-accuracy trade-off. Next, more details on D, P and H are provided.

一旦展開優化確定，下一步是設計展開超分辨率網絡(USRNet)。因爲展開優化主要包括迭代求解一個數據子問題 (即式(5)) 和一個先驗子問題 (即公式(6))，USRNet 應該輪流一個數據模塊 D 和前一個模塊 P。此外，子問題的解決方案也把 hyper-parameters $\alpha_k$ $\beta_k$ 作爲輸入，分別進一步引入 USRNet hyper-parameter 模塊 H。圖 3 展示了帶有個迭代的 USRNet 的總體架構，其中被經驗地設置爲 8，以權衡速度-準確性。接下來，提供D、P和H的更多細節。

Data module D

The data module plays the role of Eq. (7) which is the closed-form solution of the data subproblem. Intuitively, it aims to find a clearer HR image which minimizes a weighted combination of the data term $\left \| y - (z \otimes k)\downarrow_s \right \|^2$ and the quadratic regularization term $\left \| z- x_{k-1} \right \|^2$ with trade-off hyper-parameter $\alpha _k$ .

Because the data term corresponds to the degradation model, the data module thus not only has the advantage of taking the scale factor and blur kernel as input but also imposes a degradation constraint on the solution. Actually, it is difficult to manually design such a simple but useful multiple-input module. For brevity, Eq. (7) is rewritten as

$z_k = D(x_{k-1}, s, k, y, \alpha_k)$ . (8)

Note that is initialized by interpolating with scale factor via the simplest nearest neighbor interpolation. It should be noted that Eq. (8) contains no trainable parameters, which in turn results in better generalizability due to the complete decoupling between data term and prior term. For the implementation, we use PyTorch where the main FFT and inverse FFT operators can be implemented by torch.rfft and torch.irfft, respectively.

數據模塊Eq.(7) 的目的是找到一個更清晰的 HR 圖像，使最小化加權組合的數據項和二次正則項。

由於數據項對應於退化模型，因此數據模塊不僅具有將比例因子和 blur 內核作爲輸入的優點，而且對解決方案施加了退化約束。實際上，手工設計這樣一個簡單但有用的多輸入模塊是很困難的。

爲簡潔起見，將式(7)改寫爲公式（8）。數據模塊不包含可訓練參數（即數據模塊沒有用深度學習），由於數據項與先驗項完全解耦，因此具有更好的泛化能力。

具體實現，傅里葉變換與反變換可用 torch.rfft and torch.irfft 函數實現。

Figure 3. The overall architecture of the proposed USRNet with iterations. USRNet can flexibly handle the classical degradation (i.e., Eq. (1)) via a single model as it takes the LR image , scale factor , blur kernel and noise level $\sigma$ as input. Specifically, USRNet consists of three main modules, including the data module D that makes HR estimation clearer, the prior module P that makes HR estimation cleaner, and the hyper-parameter module H that controls the outputs of D and P.

Prior module P

The prior module aims to obtain a cleaner HR image by passing through a denoiser with noise level $\beta _k$ . Inspired by [66], we propose a deep CNN denoiser that takes the noise level as input

$x_k = P(z_k, \beta_k)$ . (9)

The proposed denoiser, namely ResUNet, integrates residual blocks [21] into U-Net [45]. U-Net is widely used for image-to-image mapping, while ResNet owes its popularity to fast training and its large capacity with many residual blocks. ResUNet takes the concatenated and noise level map as input and outputs the denoised image . By doing so, ResUNet can handle various noise levels via a single model, which significantly reduces the total number of parameters. Following the common setting of U-Net, ResUNet involves four scales, each of which has an identity skip connection between downscaling and upscaling operations. Specifically, the number of channels in each layer from the first scale to the fourth scale are set to 64, 128, 256 and 512, respectively. For the downscaling and upscaling operations, $2\times2$ strided convolution (SConv) and $2\times2$ transposed convolution (TConv) are adopted, respectively. Note that no activation function is followed by SConv and TConv layers, as well as the first and the last convolutional layers. For the sake of inheriting the merits of ResNet, a group of 2 residual blocks are adopted in the downscaling and upscaling of each scale. As suggested in [36], each residual block is composed of two $3\times 3$ convolution layers with ReLU activation in the middle and an identity skip connection summed to its output.

先驗模塊的目的是通過帶噪聲級的降噪器傳遞來獲得更清晰的 HR 圖像。受到 [66] 的啓發，提出了一種深度CNN降噪器，以噪聲水平作爲輸入。

先驗模塊由 ResUNet 構成。具體細節包括：

1） U-Net 包括 4 層；

2）每層的下采樣用 $2\times2$ 步進卷積 SConv；沒有跟激活函數；

3）每層的上採樣用 $2\times2$ 反捲積 TConv；沒有跟激活函數；

4）每個編解碼器包括兩個 residual block；卷積是 $3\times 3$ 的；結構如文獻【2017CVPRW Enhanced deep residual networks for single 9 image super-resolution】，下圖這個樣子的。

Hyper-parameter module H

The hyper-parameter module acts as a ‘slide bar’ to control the outputs of the data module and prior module. For example, the solution would gradually approach $x_{k-1}$ as $\alpha _k$ increases. According to the definition of $\alpha _k$ and $\beta _k$ , $\alpha _k$ is determined by $\sigma$ and $\mu _k$ , while $\beta _k$ depends on $\lambda$ and $\mu _k$ . Although it is possible to learn a fixed $\lambda$ and $\mu _k$ , we argue that a performance gain can be obtained if $\lambda$ and $\mu _k$ vary with two key elements, i.e., scale factor and noise level $\sigma$ , that influence the degree of ill-posedness. Let $\alpha = [\alpha _1, \alpha _2, . . . , \alpha _K]$ and $\beta = [\beta_1, \beta_2, . . . , \beta_K]$ , we use a single module to predict $\alpha$ and $\beta$

$[\alpha , \beta ] = H(\sigma , s)$ . (10)

The hyper-parameter module consists of three fully connected layers with ReLU as the first two activation functions and Softplus [19] as the last. The number of hidden nodes in each layer is 64. Considering the fact that $\alpha _k$ and $\beta _k$ should be positive, and Eq. (7) should avoid division by extremely small $\alpha _k$ , the output Softplus layer is followed by an extra addition of 1e-6. We will show how the scale factor and noise level affect the hyper-parameters in Sec. 4.4.

超參數模塊由三個完全連接的層組成，ReLU 作爲前兩個激活函數，Softplus[19] 作爲後兩個激活函數。每一層的隱藏節點數爲 64個。考慮到 $\alpha _k$ 和 $\beta _k$ 應該爲正，且 Eq.(7) 應該避免被極小除法，輸出 Softplus 層後面是額外添加的 1e-6。在第 4.4 節中展示尺度因子和噪聲水平如何影響超參數。

三個模塊中，D 用的是基於模型的方法（非深度學習）；P 和 H 是深度學習模型；這就是 Introduction 中說的，本文的方法是結合了二者的方法。

End-to-end training

The end-to-end training aims to learn the trainable parameters of USRNet by minimizing a loss function over a large training data set. Thus, this section mainly describe the training data, loss function and training settings. Following [58], we use DIV2K [3] and Flickr2K [55] as the HR training dataset. The LR images are synthesized via Eq. (1). Although USRNet focuses on SISR, it is also applicable to the case of deblurring with . Hence, the scale factors are chosen from {1, 2, 3, 4}. However, due to limited space, this paper does not consider the deblurring experiments. For the blur kernels, we use anisotropic Gaussian kernels as in [44, 51, 67] and motion kernels as in [5]. We fix the kernel size to 25×25. For the noise level, we set its range to [0, 25].

With regard to the loss function, we adopt the L1 loss for PSNR performance. Following [58], once the model is obtained, we further adopt a weighted combination of L1 loss, VGG perceptual loss and relativistic adversarial loss [24] with weights 1, 1 and 0.005 for perceptual quality performance. We refer to such fine-tuned model as USRGAN. As usual, USRGAN only considers scale factor 4. We do not use additional losses to constrain the intermediate outputs since the above losses work well. One possible reason is that the prior module shares parameters across iterations.

To optimize the parameters of USRNet, we adopt the Adam solver [27] with mini-batch size 128. The learning rate starts from $1 \times 10^{-4}$ and decays by a factor of 0.5 every $4 \times10^4$ iterations and finally ends with $3 \times 10^{-6}$ . It is worth pointing out that due to the infeasibility of parallel computing for different scale factors, each min-batch only involves one random scale factor. For USRGAN, its learning rate is fixed to $1 \times 10^{-5}$ . The patch size of the HR image for both USRNet and USRGAN is set to $96\times 96$ . We train the models with PyTorch on 4 Nvidia Tesla V100 GPUs in Amazon AWS cloud. It takes about two days to obtain the USRNet model.

講了一下網絡參數的設置。

USRNet：只採用 L1 Loss。

USRGAN：採用 L1 Loss、VGG Loss、relativistic adversarial Loss。

二者的學習率設置也不同。

Experiments

We choose the widely-used color BSD68 dataset [40, 46] to quantitatively evaluate different methods. The dataset consists of 68 images with tiny structures and fine textures and thus is challenging to improve the quantitative metrics, such as PSNR. For the sake of synthesizing the corresponding testing LR images via Eq. (1), blur kernels and noise levels should be provided. Generally, it would be helpful to employ a large variety of blur kernels and noise levels for a thorough evaluation, however, it would also give rise to burdensome evaluation process. For this reason, as shown in Table 1, we only consider 12 representative and diverse blur kernels, including 4 isotropic Gaussian kernels with different widths (i.e., 0.7, 1.2, 1.6 and 2.0), 4 anisotropic Gaussian kernels from [67], and 4 motion blur kernels from [5, 33]. While it has been pointed out that anisotropic Gaussian kernels are enough for SISR task [44, 51], the SISR method that can handle more complex blur kernels would be a preferred choice in real applications. Therefore, it is necessary to further analyze the kernel robustness of different methods, we will thus separately report the PSNR results for each blur kernel rather than for each type of blur kernels. Although it has been pointed out that the proper blur kernel should vary with scale factor [64], we argue that the 12 blur kernels are diverse enough to cover a large kernel space. For the noise levels, we choose 2.55 (1%) and 7.65 (3%).

評價實驗中的一些設置，如數據集的選擇，核的選擇，噪聲等級的選擇。

PSNR results

The compared methods include RCAN [70], ZSSR [51], IKC [20] and IRCNN [65]. Specifically, RCAN is stateof-the-art PSNR oriented method for bicubic degradation; ZSSR is a non-blind zero-shot learning method with the ability to handle Eq. (1) for anisotropic Gaussian kernels; IKC is a blind iterative kernel correction method for isotropic Gaussian kernels; IRCNN a non-blind deep denoiser based plug-and-play method.

Table 1. Average PSNR(dB) results of different methods for different combinations of scale factors, blur kernels and noise levels. The best two results are highlighted in red and blue colors, respectively.

對幾個典型的 SISR 算法進行介紹和 PSNR 比較。

Although USRNet is not designed for bicubic degradation, it is interesting to test its results by taking the approximated bicubic kernels in Fig. 2 as input. From Table 2, one can see that USRNet still performs favorably without training on the bicubic kernels.

Table 2. The average PSNR(dB) results of USRNet for bicubic degradation on commonly-used testing datasets.

雖然 USRNet 不是爲雙三次退化設計的，但是以圖2 中近似的雙三次核作爲輸入來測試其結果是很有趣的。從表2 可以看出，USRNet 在沒有經過雙三次內核訓練的情況下仍然表現良好。

Visual results

Figure 4. Visual results of different methods on super-resolving noise-free LR image with scale factor 4. The blur kernel is shown on the upper-right corner of the LR image. Note that RankSRGAN and our USRGAN aim for perceptual quality rather than PSNR value.

Analysis on D and P

Because the proposed USRNet is an iterative method, it is interesting to investigate the HR estimations of data module D and prior module P in different iterations. Fig. 5 shows the results of USRNet and USRGAN in different iterations for an LR image with scale factor 4. As one can see, D and P can facilitate each other for iterative and alternating blur removal and detail recovery. Interestingly, P can also act as a detail enhancer for high-frequency recovery due to the task-specific training. In addition, it does not reduce blur kernel induced degradation which verifies the decoupling between D and P. As a result, the end-to-end trained USRNet has a task-specific advantage over Gaussian denoiser based plug-and-play SISR. To quantitatively analyze the role of D, we have trained an USRNet model with 5 iterations, it turns out that the average PSNR value will decreases about 0.1dB on Gaussian blur kernels and 0.3dB on motion blur kernels. This further indicates that D aims to eliminate blur kernel induced degradation. In addition, one can see that USRGAN has similar results with USRNet in the first few iterations, but will instead recover tiny structures and fine textures in last few iterations.

Figure 5. HR estimations in different iterations of USRNet (top row) and USRGAN (bottom row). The initial HR estimation is the nearest neighbor interpolated version of LR image. The scale factor is 4, the noise level of LR image is 2.55 (1%), the blur kernel is shown on the upper-right corner of .

由於所提出的USRNet是一種迭代方法，因此研究數據模塊 D 和先驗模塊 P 在不同迭代下的 HR 估計是很有意義的。

圖5 爲比例因子爲 4 的 LR 圖像的 USRNet 和 USRGAN 在不同迭代下的結果。

D 和 P 可以互相促進迭代和交替模糊去除和細節恢復。有趣的是，由於特定任務的訓練，P 也可以作爲高頻恢復的細節增強劑。此外，它並沒有減少模糊核引起的退化，這驗證了 D 和 P 之間的解耦，因此，端到端訓練的 USRNet 比基於高斯去噪即插即用的SISR 具有任務特定的優勢。

爲了定量分析 D 的作用，訓練了一個 5 次迭代的 USRNet 模型，結果表明，在高斯模糊內核上，平均 PSNR 值會降低約 0.1dB，在運動模糊內核上，平均 PSNR 值會降低約 0.3dB。

這進一步表明 D 的目標是消除模糊核引起的退化。此外，可以看到 USRGAN 在前幾個迭代中與 USRNet 有類似的結果，但是在最後幾個迭代中將恢復微小的結構和精細的紋理。

Analysis on H

Fig. 6 shows outputs of the hyper-parameter module for different combinations of scale factor and noise level $\sigma$ . It can be observed from Fig. 6(a) that $\alpha$ is positively correlated with $\sigma$ and varies with s. This actually accords with the definition of $\alpha_i$ in Sec. 3.2 and our analysis in Sec. 3.3. From Fig. 6(b), one can see that $\beta$ has a decreasing tendency with the number of iterations and increases with scale factor and noise level. This implies that the noise level of HR estimation is gradually reduced across iterations and complex degradation requires a large $\beta_i$ to tackle with the illposeness. It should be pointed out that the learned hyperparameter setting is in accordance with that of IRCNN [65]. In summary, the learned H is meaningful as it plays the proper role.

Figure 6. Outputs of the hyper-parameter module H, i.e., (a) $\alpha$ and (b) $\beta$ , with respect to different combinations of and $\sigma$ .

圖6顯示了不同組合 hyper-parameter 模塊的輸出比例因子和噪聲 $\sigma$ 水平。它可以觀察到從圖 6 (a)， $\alpha$ 與 $\sigma$ 呈正相關，而與相反。這實際上符合 $\alpha_i$ 的定義在3.2節。

從圖 6 (b),一個可以看到 $\beta$ 隨比例因子和噪聲水平的增加，有減少的趨勢。

綜上所述，所學的 H 是有意義的，因爲它發揮了適當的作用。

Generalizability

As mentioned earlier, the proposed method enjoys good generalizability due to the decoupling of data term and prior term. To demonstrate such an advantage, Fig. 7 shows the visual results of USRNet and USRGAN on LR image with a kernel of much larger size than training size of 25×25. It can be seen that both USRNet and USRGAN can produce visually pleasant results, which can be attributed to the trainable parameter-free data module. It is worth pointing out that USRGAN is trained on scale factor 4, while Fig. 7(b) shows its visual result on scale factor 3. This further indicates that the prior module of USRGAN can generalize to other scale factors. In summary, the proposed deep unfolding architecture has superiority in generalizability.

Figure 7. An illustration to show the generalizability of USRNet and USRGAN. The sizes of the kernels in (a) and (c) are 67×67 and 70×70, respectively. The two kernels are chosen from [41].

如前所述，由於數據項和先驗項的解耦，該方法具有很好的推廣性。爲了說明這種優勢，圖7顯示了 USRNet 和 USRGAN 在 LR 圖像上的視覺結果，其核尺寸比訓練尺寸 $25 \times 25$ 大得多。可以看出，USRNet和USRGAN都可以產生視覺上令人愉悅的結果，這要歸功於可訓練的無參數數據模塊。值得指出的是，USRGAN 在scale factor 4上進行了訓練，而圖7(b) 顯示了其在 scale factor 3 上的可視化結果。這進一步表明 USRGAN 的先驗模塊可以推廣到其他尺度因子。綜上所述，所提出的深度展開結構具有普遍的優越性。

Real image super-resolution

Because Eq. (7) is based on the assumption of circular boundary condition, a proper boundary handling for the real LR image is generally required. We use the following three steps to do such pre-processing. First, the LR image is interpolated to the desired size. Second, the boundary handling method proposed in [38] is adopted on the interpolated image with the blur kernel. Last, the downsampled boundaries are padded to the original LR image. Fig. 8 shows the visual result of USRNet on real LR image with scale factor 4. The blur kernel is manually selected as isotropic Gaussian kernel with width 2.2 based on user preference. One can see from Fig. 8 that the proposed USRNet can reconstruct the HR image with improved visual quality.

Figure 8. Visual result of USRNet (×4) on a real LR image.

由於式 (7) 是基於圓形邊界條件的假設，因此通常需要對真實 LR 圖像進行適當的邊界處理。我們使用以下三個步驟來進行這種預處理。首先，將 LR 圖像內插到所需的大小。其次，對帶模糊核的插值圖像採用了 [38] 中提出的邊界處理方法。最後，將下采樣邊界填充到原始 LR 圖像。圖8 顯示了 USRNet 在實際 LR 圖像上的可視結果，比例因子爲 4。基於用戶喜好，手動選擇模糊核作爲寬度爲 2.2 的各向同性高斯核。從圖8 可以看出，提出的 USRNet 可以重建出視覺質量得到改善的 HR 圖像。

之所以加這一小結，是因爲本文的摘要、前言提及過對真實圖像有效，本節就是對這幾句話做出的驗證。

MyDLNote-Enhancment: [SR轉文] Deep Unfolding Network for Image Super-Resolution

2020CVPR -- Deep Unfolding Network for Image Super-Resolution

Abstract

Introduction

Related work

Method

Degradation model: classical vs. bicubic

Unfolding optimization

Deep unfolding network

End-to-end training

Experiments

PSNR results

Visual results

Analysis on D and P

Analysis on H

Generalizability

Real image super-resolution

IEEE-explore， Springer 文獻免費下載辦法 & IEEE 論文latex / doc 模板下載地址

經典 network -- 圖像分類篇（01 AlexNet / NIN / VGG）（持續更新）

MyDLNote - Attention: [NLA系列] Asymmetric Non-local Neural Networks for Semantic Segmentation

經典 network -- 圖像分類篇（03 ResNet v1-v2）

MyDLNote - Network: Deep High-Resolution Representation Learning for Human Pose Estimation

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結