【數據】讀取mnist數據集

原創

2020-06-11 06:09

前段時間用過CNN在mnist數據集上做訓練，最近在學機器學習算法，因此準備用SVM試試。不過在用SVM訓練前，先學習學習mnist數據集的讀取。

【數據集介紹】

先看看官方庫中的描述：

訓練數據集train和測試數據集test都分爲label和image兩個文件。
label中前兩個整數爲magic number和標籤數目number of items；
image中前四個整數爲magic number、圖片數目number of
images、行數number of rows、列數number of columns。
可以看出訓練數據集的數量爲60000，測試數據集的數量爲10000，圖片大小爲28×28。

【讀取mnist數據集】

讀取mnist數據集其實就是讀取二進制文件
讀取方式一：

import numpy as np
import struct
def load_images(file_name):
    ##   在讀取或寫入一個文件之前，你必須使用 Python 內置open()函數來打開它。##
    ##   file object = open(file_name [, access_mode][, buffering])          ##
    ##   file_name是包含您要訪問的文件名的字符串值。                         ##
    ##   access_mode指定該文件已被打開，即讀，寫，追加等方式。               ##
    ##   0表示不使用緩衝，1表示在訪問一個文件時進行緩衝。                    ##
    ##   這裏rb表示只能以二進制讀取的方式打開一個文件                        ##
    binfile = open(file_name, 'rb') 
    ##   從一個打開的文件讀取數據
    buffers = binfile.read()
    ##   讀取image文件前4個整型數字
    magic,num,rows,cols = struct.unpack_from('>IIII',buffers, 0)
    ##   整個images數據大小爲60000*28*28
    bits = num * rows * cols
    ##   讀取images數據
    images = struct.unpack_from('>' + str(bits) + 'B', buffers, struct.calcsize('>IIII'))
    ##   關閉文件
    binfile.close()
    ##   轉換爲[60000,784]型數組
    images = np.reshape(images, [num, rows * cols])
    return images

def load_labels(file_name):
    ##   打開文件
    binfile = open(file_name, 'rb')
    ##   從一個打開的文件讀取數據    
    buffers = binfile.read()
    ##   讀取label文件前2個整形數字，label的長度爲num
    magic,num = struct.unpack_from('>II', buffers, 0) 
    ##   讀取labels數據
    labels = struct.unpack_from('>' + str(num) + "B", buffers, struct.calcsize('>II'))
    ##   關閉文件
    binfile.close()
    ##   轉換爲一維數組
    labels = np.reshape(labels, [num])
    return labels

使用：

filename_train_images = '絕對路徑\\train-images.idx3-ubyte'
filename_train_labels = '絕對路徑\\train-labels.idx1-ubyte'
filename_test_images = '絕對路徑\\t10k-images.idx3-ubyte'
filename_test_labels = '絕對路徑\\t10k-labels.idx1-ubyte'
train_images=load_images(filename_train_images)
train_labels=load_labels(filename_train_labels)
test_images=load_images(filename_test_images)
test_labels=load_labels(filename_test_labels)

讀取方式二：

import numpy as np
import struct
import os
def load_mnist_train(path, kind='train'):    
    labels_path = os.path.join(path,'%s-labels.idx1-ubyte'% kind)
    images_path = os.path.join(path,'%s-images.idx3-ubyte'% kind)
    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II',lbpath.read(8))
        labels = np.fromfile(lbpath,dtype=np.uint8)
    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack('>IIII',imgpath.read(16))
        images = np.fromfile(imgpath,dtype=np.uint8).reshape(len(labels), 784)
    return images, labels
def load_mnist_test(path, kind='t10k'):
    labels_path = os.path.join(path,'%s-labels.idx1-ubyte'% kind)
    images_path = os.path.join(path,'%s-images.idx3-ubyte'% kind)
    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II',lbpath.read(8))
        labels = np.fromfile(lbpath,dtype=np.uint8)
    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack('>IIII',imgpath.read(16))
        images = np.fromfile(imgpath,dtype=np.uint8).reshape(len(labels), 784)
    return images, labels

使用：

path='絕對路徑'
train_images,train_labels=load_mnist_train(path)
test_images,test_labels=load_mnist_test(path)

打印前30個數字看一看，和前面digits數據集一樣的操作。

fig=plt.figure(figsize=(8,8))
fig.subplots_adjust(left=0,right=1,bottom=0,top=1,hspace=0.05,wspace=0.05)
for i in range(30):
    images = np.reshape(train_images[i], [28,28])
    ax=fig.add_subplot(6,5,i+1,xticks=[],yticks=[])
    ax.imshow(images,cmap=plt.cm.binary,interpolation='nearest')
    ax.text(0,7,str(train_labels[i]))
plt.show()

ok，數據讀取完畢，可以進行後續的訓練了~

代碼存放於：https://github.com/htshinichi/ML_practice/tree/master

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【數據】讀取mnist數據集

【數據集介紹】

【讀取mnist數據集】

【數據】讀取mnist數據集

【ONNX】使用yolov3.onnx模型進行目標識別的實驗

【caffe】配置caffe記錄(GPU)[2018.11.07更新]

【caffe】Caffe模型轉換爲ONNX模型(新版)

【機器學習】決策樹(二)----CART算法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結