Galaxy ZOO銀河星空圖的識別分類

本文是以2015年，Galaxy-zoo星空圖分類展開敘述，主要歸納了論文中的重點方法和技巧

http://benanne.github.io/2014/04/05/galaxy-zoo.html

挑戰賽： Kaggle

素材來源：Galaxy Zoo users (zooites) would classify images of galaxies from the Sloan Digital Sky Survey.

users are asked to describe the morphology of galaxies based on images. The questions form a decision tree which is shown in the figure below,

My solution: convnets

my solution is based around convolutional neural networks (convnets).

Transfer learning by pre-training a deep neural network on another dataset (say, ImageNet), chopping off the top layer and then training a new classifier, There were no requests to use external data in the competition forums (a requirement to be allowed to use it), so I guess nobody tried this approach.

Overfitting

As Geoffrey Hinton has been known to say, if you’re not overfitting, your network isn’t big enough. avoiding overfitting.

I tackled this problem with three orthogonal approaches:

data augmentation
dropout and weight norm constraints
modifying the network architecture to increase parameter sharing

The best model I found has about 42 million parameters.儘管overfiting，但要表現仍然很好，可以逐步改進他。

Software and hardware

I used scikit-image for preprocessing and augmentation

I used Python, NumPy and Theano to implement my solution.

Preprocessing and data augmentation

Cropping and downsampling

424x424 colour JPEG images, along with 37 weighted probabilities

so I cropped all images to 207x207. I then downsampled them 3x to 69x69,

裁剪，下采樣，用了3種方式縮到69*69

Exploiting spatial invariances

Images of galaxies are rotation invariant:

They are also scale invariant and translation invariant to a limited extent

Each training example was perturbed before presenting it to the network by randomly scaling it, rotating it, translating it and optionally flipping it. I used the following parameter ranges:

rotation: random with angle between 0° and 360° (uniform)
translation: random with shift between -4 and 4 pixels (relative to the original image size of 424x424) in the x and y direction (uniform)
zoom: random with scale factor between 1/1.3 and 1.3 (log-uniform)
flip: yes or no (bernoulli)

Because both the initial downsampling to 69x69 and the random perturbation are affine transforms, they could be combined into one affine transformation step (I used scikit-image for this). This sped up things significantly and reduced information loss.

四種變換全部集中在一個Affine tansforms中。最後形成69*69的圖.

Colour perturbation

顏色擾動變化

he colour of the images was changed as described in Krizhevsky et al. 2012the first component had a much larger eigenvalue than the other two。一個通道比其他2個通道有更大的特徵值。

the standard deviation for the scale factor alpha was set to 0.5.？？沒懂這意思，

The model has 7 layers: 4 convolutional layers and 3 dense layers. All convolutional layers include a ReLU nonlinearity (i.e.

f(x)
= max(x, 0)

). The first, second and fourth convolutional layers are followed by 2x2 max-pooling.

the convolutional part of the network is applied to 16 different parts of the input image

The dense part consists of two maxout layers with 2048 units (Goodfellow et al. 2013), both of which take the maximum over pairs of linear filters (so 4096 linear filters in total).

. Using maxout here instead of regular dense layers with ReLUs helped to reduce overfitting a lot, compared to dense layers with 4096 linear filters. Using maxout in the convolutional part of the network as well proved too computationally intensive.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Galaxy ZOO銀河星空圖的識別分類

My solution: convnets

Overfitting

Software and hardware

Preprocessing and data augmentation

Cropping and downsampling

Exploiting spatial invariances

Colour perturbation

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

CV大牛部分表格

Deep Learning方向的paper整理(1)

機器學習資料收集（持續更新）--書籍--個人主頁

機器學習的學習資源--入門書-進階書-入門視頻-繼續閱讀推薦

Eeeplearning-正則化方法--L1和L2 regularization、數據集擴增、dropout

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結