【論文閱讀】Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift

Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift

SUMMARY@ 2020/5/12


1. Method abstract

Inspired by the distribution weighted combining rule in [33], the target distribution can be represented as the weighted combination of the multi-source distributions.

An ideal target predictor can be obtained by integrating all source predictions based on the corresponding source distribution weights.

  • besides of the feature extractor
  • DCTN also includes a (multi-source) category classifier to predict the class from different sources
  • and a (multi-source) domain discriminator to produce multiple source-target-specific perplexity scores as the approximation of source distribution weights.

during training, two alternating adaptation steps:

  • domain discriminator: The multi-way adversarial adaptation implicitly reduces domain shifts among
    those sources.

    • deploys multi-way adversarial learning to minimize the discrepancy between the target and each of the multiple source domains,
    • also predict the source-specific perplexity scores to denote the possibilities that a target sample belongs to different source domains.
  • feature extractor and the category classifier

    • The multi-source category classifiers are integrated with the perplexity scores to classify target
      sample, and the pseudo-labeled target samples together with source samples are utilized to update the multi-source category classifier and the feature extractor

2. Motivation

This paper focuses on the problem of multi-source domain adaptation, where there is category shift between diverse sources.

Category shift is a new protocol in MDA, where domain shift and categorical disalignment co-exist among the sources.

This paper aims at domain shift and category shift all together.

3. Challenges /Problem to be solved

  • cannot simply apply same UDA via combining all source domains since there are possible domain shifts among sources
  • eliminate the distribution discrepancy between target and each source maybe too strict, and harmful.
  • category shift in sources

4. Contribution

    1. We present a novel and realistic MDA protocol termed category shift that relaxes the requirement on the shared category set among any source domains.
    1. Inspired from the distribution weighted combining rule, we proposed the deep cocktail network (DCTN) together with the alternating adaptation algorithm to learn transferable and discriminative representation.
    1. We conduct comprehensive experiments on three well-known benchmarks, and testify our model in both the vanilla and the category shift settings. Our method has achieved the state of the art across most transfer tasks.

5. Related work

5.1 Unsupervised domain adaptation with single source

  • domain discrepancy based methods: reduce the domain shift across the source and the target
  • domain discrepancy based methods
  • deep-model-based
  • adversarial learning based
  • others: semi-supervised method [42], domain reconstruction [14], duality [19], alignments [9] [50] [44], manifold learning [15], tensor methods [24],[31], etc.

5.2 Domain adaptation with multiple sources

  • originates from A-SVM[49]
  • shallow models[8] [22] [27]
  • theoretical
    • learning bound for multi source DA[3]
    • distribution weighted combining rule[33]

5.3 two branches of transfer learning closely relate to MDA (supervised)

  • continual transfer learning (CTL) [43] ,[39].
    • CTLs train the learner to sequentially master multiple tasks across multiple domains.
  • domain generalization (DG)
    • uses the existing multiple labeled domains for training regardless of the unlabeled target samples.[13, 35]

6. Settings

  • Suppose the classifier for each source domain is known

  • Vanilla MDA: samples from diverse sources share a same category set

  • Category Shift: categories from different sources might be also different

  • NN different underlying source distributions {psj(x,y)}j=1N\{p_{\mathbf s_j}(x,y)\}_{j=1}^N

    • Xsj={xisj}i=1XsjX_{s_{j}}=\left\{x_{i}^{s_{j}}\right\}_{i=1}^{\left|X_{s_{j}}\right|}
    • Ysj={yisj}i=1YsjY_{s_{j}}=\left\{y_{i}^{s_{j}}\right\}_{i=1}^{\left|Y_{s_{j}}\right|}
  • 1 target distribution pt(x,y)p_t(x,y), no label

    • Xt={xit}i=1XtX_{t}=\left\{x_{i}^{t}\right\}_{i=1}^{\left|X_{t}\right|}
  • training set ensemble: N+1N+1 datasets

  • testing set: from target distribution

  • target domain get labeled by the union of all categories in those sources

    Ct=j=1NCsj\mathcal{C}_{t}=\bigcup\limits_{j=1}^{N} \mathcal{C}_{s_{j}}

7. Compared with Open Set DA

  • The uncommon classes are unified as a negative category called “unknown”.

  • In contrast, category shift consider the specific disaligned categories among multiple sources to enrich the classification in transfer.

8. DCTN: framework details

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-lxYJA9mW-1590145338545)(DCTN%20deep%20cocktail%20network/image-20200513104324521.png)]

8.1 Feature extractor FF

  • deep convolution nets as the backbone
  • share weights: map all images from N sources and target into a common feature space
  • employ adversarial learning to obtain the optimal mapping
    • because it can successfully learn both domain-invariant features and each target-source-specific relations.

8.2 (Multi-source) domain discriminator DD

  • NN source-specific discriminators: {Dsj}j=1N\left\{D_{s_j}\right\}_{j=1}^{N}

  • Given image xx from the source jj or the target domain, the domain discriminator DD receives the features F(x)F(x), classifies whether from the source jj or the target

  • for the data flow from each target instance xtx_t, the domain discriminators DD yields the NN source-specific discriminative results
    {Dsj(F(xt))}j=1N \left\{D_{s_j}(F(x^t))\right\}_{j=1}^{N}

  • target-source perplexity scores
    Scf(xt;F,Dsj)=log(1Dsj(F(xt)))+αsj \mathcal{S}_{c f}\left(x^{t} ; F, D_{s_{j}}\right)=-\log \left(1-D_{s_{j}}\left(F\left(x^{t}\right)\right)\right)+\alpha_{s_{j}}
    αsj\alpha_{s_{j}} is the source-specific concentration constant, It is obtained by averaging the source jj discriminator losses over XsjX_{s_j}.

    in supplementary, different score, different α\alpha:

    αsj=1NTiNT(Dsj(1F(xisj)))2 \alpha_{s_{j}}=\frac{1}{N_{T}} \sum_{i}^{N_{T}}\left(D_{s_{j}}\left(1-F\left(x_{i}^{s_{j}}\right)\right)\right)^{2}

    NTN_T denotes how many times the target samples have been visited to train our model

    xisjx_{i}^{s_{j}} denotes the source j instance come along with the coupled target instances in the adversarial learning.

8.3 (Multi-source) category classifier CC

  • a multi-output net composed by NN source-specific predictors {Csj}j=1N\left\{C_{s_j}\right\}_{j=1}^{N}

  • Each predictor is softmax classifier

  • for the image from source jj: only the value from CsjC_{s_j} get activated and provides the gradient for training

  • For a target image xtx_t instead, all source-specific predictors provide NN categorization results {Csj(F(xt))}j=1N\{C_{s_j} (F(x_t))\}^N_{j =1}to the target classification operator.

8.4 Target classification operator

  • for each target feature F(xt)F(x_t), the target classification operator takes each source perplexity score Scf(xt;F,Dsj)\mathcal{S}_{c f}\left(x^{t} ; F, D_{s_{j}}\right) to re-weight the corresponding source-specific prediction {Csj(F(xt))}\{C_{s_j} (F(x_t))\}

    the confidence xtx_t belongs to cc presents as

Confidence(cxt):=cCsjScf(xt;F,Dsj)cCskScf(xt;F,Dsk)Csj(cF(xt))where cj=1NCsj(2) Confidence \left(c | x^{t}\right):=\sum_{c \in \mathcal{C}_{s_{j}}} \frac{\mathcal{S}_{c f}\left(x^{t} ; F, D_{s_{j}}\right)}{\sum\limits_{c \in \mathcal{C}_{s_{k}}} \mathcal{S}_{c f}\left(x^{t} ; F, D_{s_{k}}\right)} C_{s_{j}}\left(c | F\left(x^{t}\right)\right) \\ where\ c\in\bigcup_{j=1}^{N} \mathcal{C}_{s_{j}}\tag{2}

  • Csj(cF(xt))C_{s_{j}}\left(c | F\left(x^{t}\right)\right) denotes the softmax value of source jj corresponding to class cc
  • cCsj\sum\limits_{c\in \mathcal C_{s_j}} means only those sources with class cc can join the perplexity score weighting.
  • cCsk\sum\limits_{c\in \mathcal C_{s_k}} means all the sources

8.5 Connection to distribution weighted combining rule

  • In the distribution weighted combining rule [33], the target distribution is treated as a mixture of the multi-source distributions with the coefficients by normalized source distributions weighted by unknown positive {λj}j=1N\{\lambda_j\}_{j=1}^N, namely Dt(x)=cCskNλkDsk(x)\mathcal{D}_{t}(x)=\sum_{c \in \mathcal{C}_{s_k}}^{N} \lambda_{k} \mathcal{D}_{s_{k}}(x)

hλ(x)=i=1kλiDi(x)j=1kλjDj(x)hi(x) h_{\lambda}(x)=\sum_{i=1}^{k} \frac{\lambda_{i} D_{i}(x)}{\sum_{j=1}^{k} \lambda_{j} D_{j}(x)} h_{i}(x)

note that the hypothesis is one-dimension output hi(x)Rh_i(x)\in \mathbb R

  • in this paper

The ideal target classifier presents as the weighted combination of source classifiers.

Note that here each classifier for each source CsjC_{s_j} is a multi output softmax result.
Ct(cxt)=cCsλjDsj(xt)cCskλkDsk(xt)Csj(cF(xt)) C_{t}\left(c | x^{t}\right)=\sum_{c \in \mathcal{C}_{s}} \frac{\lambda_{j} \mathcal{D}_{s_{j}}\left(x^{t}\right)}{\sum_{c \in \mathcal{C}_{s_{k}}} \lambda_{k} \mathcal{D}_{s_{k}}\left(x^{t}\right)} C_{s_{j}}\left(c | F\left(x^{t}\right)\right)
with the increase of the probability that xtx_t from source jj, Dsj(F(xt))1,Dsj(xt)1D_{s_{j}}\left(F\left(x^{t}\right)\right)\rightarrow 1,\mathcal D_{s_{j}}\left(x^{t}\right)\rightarrow 1

so λjDsj(xt)Scf(xt;F,Dsj)=log(1Dsj(F(xt)))+αsj\lambda_{j} \mathcal{D}_{s_{j}}\left(x^{t}\right) \propto\mathcal{S}_{c f}\left(x^{t} ; F, D_{s_{j}}\right)=-\log \left(1-D_{s_{j}}\left(F\left(x^{t}\right)\right)\right)+\alpha_{s_{j}}

  • 所以用score代替了distribution的weighting
  • target images should be categorized by the classifiers from multiple sources, with whose features more similar to target, the source classifiers’ prediction are more trustful

9 Learning

9.1 Pre-training C and F

  • take all source images to jointly train the feature extractor F and the category classifier C

  • pseudo label for target: Those networks and the target classification operator then predict categories for all target images and annotate those with high confidences.

  • Since the domain discriminator hasn’t been trained, we take the uniform distribution simplex weight as the perplexity scores to the target classification operator.

  • Finally, we obtain the pre-trained feature extractor and category classifier via further fine-tuning them with sources and the pseudo-labeled target images.

    In object recognition, we initiate our DCTN by following the same way of DAN (start with an AlexNet model pretrained on ImageNet 2012 and fine-tune it).

    In terms of digit recognition, we perform DCTN learning from scratch.

9.2 Multi-way Adversarial Adaptation

ref: ADDA論文Adversarial Discriminative Domain Adaptation

  • original GAN:(M means mapping / feature extractor)
    LadvD(Xs,Xt,Ms,Mt)=ExsXs[logD(Ms(xs))]ExtXt[log(1D(Mt(xt)))] \begin{array}{c} \mathcal{L}_{\mathrm{adv}_{D}}\left(\mathbf{X}_{s}, \mathbf{X}_{t}, M_{s}, M_{t}\right)= \\ -\mathbb{E}_{\mathbf{x}_{s} \sim \mathbf{X}_{s}}\left[\log D\left(M_{s}\left(\mathbf{x}_{s}\right)\right)\right] \\ -\mathbb{E}_{\mathbf{x}_{t} \sim \mathbf{X}_{t}}\left[\log \left(1-D\left(M_{t}\left(\mathbf{x}_{t}\right)\right)\right)\right] \end{array}

LadvM=LadvD \mathcal{L}_{\mathrm{adv}_{M}}=-\mathcal{L}_{\mathrm{adv}_{D}}

minDLadvD(Xs,Xt,Ms,Mt)minMs,MtLadvM(Xs,Xt,D) \begin{array}{c} \min _{D} \mathcal{L}_{\mathrm{adv}_{D}}\left(\mathbf{X}_{s}, \mathbf{X}_{t}, M_{s}, M_{t}\right) \\ \min _{M_{s}, M_{t}} \mathcal{L}_{\mathrm{adv}_{M}}\left(\mathbf{X}_{s}, \mathbf{X}_{t}, D\right) \end{array}

  • change method 1: early on during training the discriminator converges quickly, causing the gradient to vanish, change the generator objective, splits the optimization into two independent objectives, one for the generator and one for the discriminator,
    LadvM(Xs,Xt,D)=ExtXt[logD(Mt(xt))](**) \mathcal{L}_{\mathrm{adv}_{M}}\left(\mathbf{X}_{s}, \mathbf{X}_{t}, D\right)=-\mathbb{E}_{\mathbf{x}_{t} \sim \mathbf{X}_{t}}\left[\log D\left(M_{t}\left(\mathbf{x}_{t}\right)\right)\right] \tag{**}

  • change method 2: in the setting where both distributions are changing, this objective will lead to oscillation–when the mapping converges to its optimum, the discriminator can simply flip the sign of its prediction in response.

    Tzeng et al. instead proposed the domain confusion objective, under which the mapping is trained using a cross-entropy loss function against a uniform distribution

    This loss ensures that the adversarial discriminator views the two domains identically.

    confuse就是要讓它“半信半疑”,讓source和target經過mapping的marginal distribution儘量接近。來自source和target的可能性都接近一半(或者說相當於source和target中的樣本的真實domain標籤都是來自1和0的可能性佔一半,這樣最小化這個差異的交叉熵損失函數,得到的mapping後的source和target分佈就都是接近均均分佈,可以認爲source和target被map成很相似的domain,DA的任務就完成了)

    LadvM(Xs,Xt,D)=d{s,t}Exdxd[12logD(Md(xd))+12log(1D(Md(xd)))](*) \begin{array}{l} \mathcal{L}_{\mathrm{adv}_{M}}\left(\mathbf{X}_{s}, \mathbf{X}_{t}, D\right)= \begin{aligned} -\sum_{d \in\{s, t\}} & \mathbb{E}_{\mathbf{x}_{d} \sim \mathbf{x}_{d}}\left[\frac{1}{2} \log D\left(M_{d}\left(\mathbf{x}_{d}\right)\right)\right. \left.+\frac{1}{2} \log \left(1-D\left(M_{d}\left(\mathbf{x}_{d}\right)\right)\right)\right] \end{aligned} \end{array} \tag{*}

    注意:其實*式子在ADDA論文中結果沒有用,只是用來說明related work,ADDA中用的還是(**);

    *式子是 Simultaneous Deep Transfer Across Domains and Tasks 文中提出來的;

    ADDA論文改了generator的優化目標爲**。

in this paper

  • minmax adversarial domain adaptation
    minFmaxDV(F,D;Cˉ)=Ladv(F,D)+Lcls(F,Cˉ)(4) \min _{F} \max _{D} V(F, D ; \bar{C})=\mathcal{L}_{a d v}(F, D)+\mathcal{L}_{c l s}(F, \bar{C})\tag{4}

    • classifier CC is fixed as Cˉ\bar C to provide stable gradient values.
    • the first term denotes our adversarial mechanism
    • the second term is a multi-source classification losses.

The optimization based on Eq.4 works well for DD but not FF.

Since the feature extractor learns the mapping from the multiple sources and the target, the domain distributions become simultaneously changing in adversary, which results in an oscillation then spoils our feature extractor.

when source and target feature mappings share their architectures, the domain confusion can be introduced to replace the adversarial objective, which performs stable to learn the mapping FF.

  • multidomain confusion loss
    Ladv(F,D)=1NjNExXsjLcf(x;F,Dsj)+ExXtLcf(x;F,Dsj)(6) \begin{array}{l} \mathcal{L}_{a d v}(F, D)=\frac{1}{N} \sum_{j}^{N} \mathbb{E}_{x \sim X_{s_{j}}} \mathcal{L}_{c f}\left(x ; F, D_{s_{j}}\right) +\mathbb{E}_{x \sim X_{t}} \mathcal{L}_{c f}\left(x ; F, D_{s_{j}}\right) \end{array} \tag{6}
    where
    Lcf(x;F,Dsj)=12logDsj(F(x))+12log(1Dsj(F(x)))(7) \begin{array}{c} \mathcal{L}_{c f}\left(x ; F, D_{s_{j}}\right)= \frac{1}{2} \log D_{s_{j}}(F(x))+\frac{1}{2} \log \left(1-D_{s_{j}}(F(x))\right) \end{array}\tag{7}
    i.e.
    Ladv(F,D)=1NjNExXsj[12logDsj(F(x))+12log(1Dsj(F(x)))]+1NjNExXt[12logDsj(F(x))+12log(1Dsj(F(x)))] \begin{array}{l} \mathcal{L}_{a d v}(F, D)=\frac{1}{N} \sum_{j}^{N} \mathbb{E}_{x \sim X_{s_{j}}} \Big[\frac{1}{2} \log D_{s_{j}}(F(x))+\frac{1}{2} \log \left(1-D_{s_{j}}(F(x))\right)\Big]\\ +\frac{1}{N} \sum_{j}^{N}\mathbb{E}_{x \sim X_{t}} \Big[\frac{1}{2} \log D_{s_{j}}(F(x))+\frac{1}{2} \log \left(1-D_{s_{j}}(F(x))\right)\Big] \end{array}
    和(*)的差別在於:

    • 沒有負號

    • 是multi source所以有N個discriminator,每個對應一個source和target的域判別

    • *中是source和target的mapping不一樣,這裏是feature extractor一樣

    • 本文中直接修改成了*是discriminator和generator公用的loss function(的相反數,因爲爲負數),表示的是target和每個source之間

      交叉熵表示的是兩個分佈之間的差異,注意交叉熵一定是正數的結果

      • 最大化6式,等價於最小化交叉熵損失,就是最優化discriminator

Online hard domain batch mining

  • samples from different sources are sometimes useless to improve the adaptation to the target, and as the training proceeds, more redundant source samples turn to draw back the whole model performance

  • minibatch: sample batch MM for target and each source domain

  • Each source target discriminator DsjD_{s_j}‘s loss is viewed as the degrees to distinguish MM xitx^t_i from the jjth source’ s MM samples

iMlogDsj(F(xisj))log(1Dsj(F(xit))) \sum_i^M - \log D_{s_{j}}(F(x_i^{s_j})) - \log \left(1-D_{s_{j}}(F(x_i^{t}))\right)

這裏是交叉熵損失,是最原始GAN的形式。越大表示損失越大,表示對M個source樣本和M個target樣本的來自source jj 還是target domain的區分效果越差,即這個source jj的discriminator效果不好。

  • find hard source domain: feature extractor FF performs the worst to transform the target samples to confuse the jj^*th source

j=argmaxjN{iMlogDsj(F(xisj))log(1Dsj(F(xit)))}j=1N j^*= \mathrm{arg}\max_j^{N}\Big\{ \sum_i^M - \log D_{s_{j}}(F(x_i^{s_j})) - \log \left(1-D_{s_{j}} (F(x_i^{t}))\right) \Big\}_{j=1}^N

  • we use the source jj^* and the target samples in the minibatchto train the feature extractor

  • 以下是用於迭代更新、找到最好的feature extractor的算法1
    Ladvsj(F,D)=iMLcf(xisj;F,Dsj)+Lcf(xit;F,Dsj) \mathcal{L}_{a d v}^{s_j^*}(F, D)=\sum_{i}^{M} \mathcal{L}_{c f}\left(x_i^{s_j^*} ; F, D_{s_j^*}\right) +\mathcal{L}_{c f}\left(x_i^t ; F, D_{s_j^*}\right)

    minFmaxDV(F,D;Cˉ)=Ladvsj(F,D)+Lcls(F,Cˉ)(4) \min _{F} \max _{D} V(F, D ; \bar{C})=\mathcal{L}_{a d v}^{s_j^*}(F, D)+\mathcal{L}_{c l s}(F, \bar{C})\tag{4}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ZRWRqJY1-1590145338548)(DCTN%20deep%20cocktail%20network/image-20200514150029895.png)]

9.3 Target Discriminative Adaptation

  • Aided by the multi-way adversary, DCTN has been able to obtain good domain-invariant features, yet not surely classifiable in the target domain.

  • auto-labeling strategy: annotate target samples, then jointly train our feature extractor and multi-source category classifier with source and target images by their (pseudo-) labels

  • classification losses from multiple source images and target images with pseudo labels

minF,CLcls(F,C)=jNE(x,y)(Xsj,Ysj)[L(Csj(F(x)),y)]+E(xt,y^)(Xtp,Ytp)[y^Cs^L(Cs^(F(xt)),y^)](8) \min _{F, C} \mathcal{L}_{c l s}(F, C)=\sum_{j}^{N} \mathbb{E}_{(x, y) \sim\left(X_{s_{j}}, Y_{s_{j}}\right)}\left[\mathcal{L}\left(C_{s_{j}}(F(x)), y\right)\right] +\mathbb{E}_{\left(x^{t}, \hat{y}\right) \sim\left(X_{t}^{p}, Y_{t}^{p}\right)}\left[\sum_{\hat{y} \in \mathcal{C}_{\hat{s}}} \mathcal{L}\left(C_{\hat{s}}\left(F\left(x^{t}\right)\right), \hat{y}\right)\right] \tag{8}

apply the target classification operator to assign pseudo labels, and the samples with the confidence higher than a preseted threshold will be selected into XtPX^P_t .

given a target instance xtx^t with pseudo-labeled class y^\hat y, we find those sources s^\hat s include this class (y^Cs^)(\hat y \in \mathcal C_{\hat s}), then update our network via the sum of the multi-source classification losses

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-drxczEjJ-1590145338549)(DCTN%20deep%20cocktail%20network/image-20200514155059169.png)]

10. Experiments

10.1 Benchmarks

  • 3 widely used UDA benchmarks
    • Office-31 [41]:
      • a object recognition benchmark with 31 categories and 4652 images unevenly spread in three visual domains A (Amazon), D (DSLR), W (Webcam).
    • ImageCLEF-DA:
      • 50 images in each category
      • totally 600 images for each domain
      • derives from ImageCLEF 2014 domain adaptation challenge, and is organized by selecting 12 object categories (aeroplane, bike bird, boat, bottle, bus, car, dog, horse, monitor, motorbike, and people) shared in the three famous real-world datasets, I (ImageNet ILSVRC 2012), P (Pascal VOC 2012), C (Caltech-256).
    • Digits-five
      • five digit image sets respectively sampled from following public datasets
        • mt (MNIST) [26]
        • mm (MNIST-M) [11]
        • sv(SVHN) [36]
        • up (USPS)
        • sy (Synthetic Digits) [11].
      • Towards the images in MNIST, MNIST-M, SVHN and Synthetic Digits, we draw 25000 for training and 9000 for testing in each dataset.
      • There are only 9298 images in USPS, so we choose the entire dataset as our domain.

10.2 Evaluations in the vanilla(common) setting

baseline

  • mullti source: two shallow methods

    • sparse FRAME (sFRAME) [46]
      • a non-stationary Markov random field model that reproduces the observed statistical properties of filter responses at a subset of selected locations, scales and orientations.
      • representing a wide variety of object patterns in natural images and that the learned models are useful for object classification.
    • SGF [16]
      • Motivated by incremental learning, we create intermediate representations of data between the two domains by viewing the generative subspaces (of same dimension) created from these domains as points on the Grassmann manifold, and sampling points along the geodesic between them to obtain subspaces that provide a meaningful description of the underlying domain shift.
  • single source models----> multi source: conventional (TCA, GFK)/ deep

    Since those methods perform in single-source setting, we introduce two MDA standards for different purposes

    • Source combine: all source domains are combined into a traditional single-source v.s. target setting.
      • The first standard testify whether the multi-source is valuable to exploit
    • Single best: in the multi-source domains, we report the single source transfer result best-performing in the test set.
      • whether we can further improve the best single source UDA via introducing another source transfer.
  • source only

    • as baselines in the Source combine and multisource standards
    • use all images from sources to train backbone-based multi-source classifiers and directly apply them to classify target images

10.3 Evaluations in the category shift setting

  • depart all categories into two non-overlapped class sets and define them as the private classes

    • overlap
    • disjoint
  • DAN also suffers negative transfer gains in most situations, which
    indicates the transferbility of DAN cripled in the category
    shift.

  • In contrast, DCTN reduces the performance drops compared to the model in the vanilla setting, and obtains positive transfer gains in all situations. It reveals that DCTN can resist the negative transfer caused by the category shift

11. Further Analysis

11.1 Feature visualization.

visualize the DCTN activations before and after adaptation.

  • DCTN can successfully learn transferable features with multiple sources
  • features learned by DCTN attains desirable discriminative property

11.2 Ablation study

  • The adversarial-only model excludes the pseudo labels and updates the category classifier with source samples.

  • The pseudo-only model forbids the adversary and categorize target samples with average multi-source results

  • without domain batch mining technique

11.3 Convergence analysis

despite of the frequent deviation, the classification loss, adversarial loss and testing error gradually converge.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章