數據集

小樣本學習（few shot learning）裏面常用的測試數據集主要有Omniglot和miniImagenet兩個，但是網上能查到的下載地址都在谷歌網盤上，而且miniImagenet中還缺少標註數據的csv文件,這裏寫一下搜索到的地址

miniImagenet部分

miniImagenet下載地址：

百度雲鏈接: https://pan.baidu.com/s/1npRhZajLrLe6-KtSbJsa1A 密碼: ztp5
百度雲下載速度有些慢，嘗試使用谷歌雲盤：https://drive.google.com/open?id=1HkgrkAwukzEZA0TpO7010PkAOREb2Nuk
需要csv文件從這裏獲取：https://github.com/vieozhu/MAML-TensorFlow-1

開始主要是跑MAML算法測試，發現github上cbfinn提供的代碼https://github.com/cbfinn/maml.git中，處理數據的部分只適用於linux，在win下運行會出錯，將proc_images.py中os.system改爲對應的os操作即可。
直接貼修改後的代碼

"""
Script for converting from csv file datafiles to a directory for each image (which is how it is loaded by MAML code)

Acquire miniImagenet from Ravi & Larochelle '17, along with the train, val, and test csv files. Put the
csv files in the miniImagenet directory and put the images in the directory 
其實這裏的意思就是，你要把下載的原始miniimagenet數據集解壓縮之後的images文件夾移動到miniImagenet文件夾之下，
你的proc_images.py文件也在同一個文件夾之下，這樣就可以對數據進行處理了。
'miniImagenet/images/'.
Then run this script from the miniImagenet directory:
    cd data/miniImagenet/
    python proc_images.py
"""
上面這部分是finn自己的代碼適合linux
from __future__ import print_function
import csv
import glob
import os

from PIL import Image

path_to_images = 'images/'

all_images = glob.glob(path_to_images + '*')

# Resize images
for i, image_file in enumerate(all_images):
    im = Image.open(image_file)
    im = im.resize((84, 84), resample=Image.LANCZOS)
    im.save(image_file)
    if i % 500 == 0:
        print(i)

# Put in correct directory
for datatype in ['train', 'val', 'test']:
    os.system('mkdir ' + datatype)

    with open(datatype + '.csv', 'r') as f:
        reader = csv.reader(f, delimiter=',')
        last_label = ''
        for i, row in enumerate(reader):
            if i == 0:  # skip the headers
                continue
            label = row[1]
            image_name = row[0]
            if label != last_label:
                cur_dir = datatype + '/' + label + '/'
                os.system('mkdir ' + cur_dir)
                last_label = label
            os.system('mv images/' + image_name + ' ' + cur_dir)

下面這部分是適用於windows的
from __future__ import print_function
import csv
import glob
import os

from PIL import Image

path_to_images = 'images/'

all_images = glob.glob(path_to_images + '*')

# Resize images
for i, image_file in enumerate(all_images):
    im = Image.open(image_file)
    im = im.resize((84, 84), resample=Image.LANCZOS)
    im.save(image_file)
    if i % 500 == 0:
        print(i)

# Put in correct directory
for datatype in ['train', 'val', 'test']:
    os.mkdir(datatype)

    with open(datatype + '.csv', 'r') as f:
        reader = csv.reader(f, delimiter=',')
        last_label = ''
        for i, row in enumerate(reader):
            if i == 0:  # skip the headers
                continue
            label = row[1]
            image_name = row[0]
            if label != last_label:
                cur_dir = datatype + '/' + label + '/'
                os.mkdir(cur_dir)
                last_label = label
            os.rename('images/' + image_name,  cur_dir+image_name)

Omniglot數據集

直接下載github整個項目(94M)，解壓取python版本，新建一個data，將所有壓縮包放進data即可。

數據集簡介

Omniglot 一般會被戲稱爲 MNIST 的轉置，大家可以想想爲什麼？下面對 Omniglot 數據集進行簡要介紹：

Omniglot 數據集包含來自 5050 個不同字母的 16231623 個不同手寫字符。每一個字符都是由 2020 個不同的人通過亞馬遜的 Mechanical Turk 在線繪製的。

每個圖像都與筆畫數據配對, 座標序列爲 [x, y, t][x,y,t], 且時間 (t)(t) 以毫秒爲單位。筆畫數據僅在 matlab/ 文件中可用。

數據集的引用: Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.

Omniglot 數據集總共包含 5050 個字母。我們通常將這些分成一組包含 3030 個字母的背景（background）集和一組包含 2020 個字母的評估（evaluation）集。

更具挑戰性的表示學習任務是使用較小的背景集 “background small 1” 和 “background small 2”。每一個都只包含 55 個字母, 更類似於一個成年人在學習一般的字符時可能遇到的經驗。

爲了更加直觀的感受 Omniglot 的組成，我藉助 brendenlake/omniglot 的源碼，對該數據集進行了剖析，並以 .ipynb 的文件格式進行展示。數據集具體形式可見 omniglot/python 。查看 數據使用說明 無需解壓便可直接獲取數據集的相關信息。如果你更喜歡命令行的形式，可以查看 dataloader。

更進一步，如果你想要使用 Modified Hausdorff 距離測試 one-shot 在原論文的效果如何，你可以查看 one-shot-classification。

更甚者，如果你僅僅是想要在線查看該數據集，而不想將其下載下來。你可以在 https://mybinder.org/上在線對該數據集進行一些你想要的操作，包括跑程序。具體的做法是：

點擊 Omniglot 進入在線編輯模式；
數據集見 omniglot/ 目錄；數據使用說明.ipynb 文件可以用來操作 Omniglot 數據集；
測試 one-shot 的數據集見 omniglot/python/one-shot-classification 目錄。文件 test_demo.ipynb 可以做一些測試工作。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

MAML中few-shot (小樣本）learning中數據集的處理

數據集

miniImagenet部分

Omniglot數據集

數據集簡介

Windows下Python報錯No module named PIL解決方法

Imagenet的中英對應分類

機器學習優化方法：Momentum動量梯度下降

NeurIPS2019 對抗樣本＋元學習paper收錄

詳解動態規劃算法（Python實現動態規劃算法典型例題）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結