本文是以2015年,Galaxy-zoo星空圖分類展開敘述,主要歸納了論文中的重點方法和技巧
http://benanne.github.io/2014/04/05/galaxy-zoo.html
挑戰賽: Kaggle
素材來源:Galaxy
Zoo users (zooites) would classify images of galaxies from the Sloan
Digital Sky Survey.
users
are asked to describe the morphology of galaxies based on images. The questions form a decision tree which is shown in the figure below,
My solution: convnets
my solution is based around convolutional neural networks (convnets).Transfer learning by pre-training a deep neural network on another dataset (say, ImageNet), chopping off the top layer and then training a new classifier, There were no requests to use external data in the competition forums (a requirement to be allowed to use it), so I guess nobody tried this approach.
Overfitting
I tackled this problem with three orthogonal approaches:
- data augmentation
- dropout and weight norm constraints
- modifying the network architecture to increase parameter sharing
Software and hardware
I used Python, NumPy and Theano to implement my solution.
Preprocessing and data augmentation
Cropping and downsampling
424x424 colour JPEG images, along with 37 weighted probabilitiesExploiting spatial invariances
Each training example was perturbed before presenting it to the network by randomly scaling it, rotating it, translating it and optionally flipping it. I used the following parameter ranges:
- rotation: random with angle between 0° and 360° (uniform)
- translation: random with shift between -4 and 4 pixels (relative to the original image size of 424x424) in the x and y direction (uniform)
- zoom: random with scale factor between 1/1.3 and 1.3 (log-uniform)
- flip: yes or no (bernoulli)
Colour perturbation
顏色擾動變化f(x)
= max(x, 0)
). The first, second and fourth convolutional layers are followed by 2x2 max-pooling. . Using maxout here instead of regular dense layers with ReLUs helped to reduce overfitting a lot, compared to dense layers with 4096 linear filters. Using maxout in the convolutional part of the network as well proved too computationally intensive.