從年初起,幾家國際大廠的開發者大會,無論是微軟Build、Facebook F8還是稍後的Google I/O,莫不把“AI優先”的大旗扯上雲霄。
如果這一波AI大潮只是空喊幾句口號,空提幾個戰略,空有幾家炙手可熱的創業公司,那當然成不了什麼大氣候。但風浪之下,我們看到的卻是,Google一線的各大業務紛紛改用深度學習,落伍移動時代的微軟則已拉起一支近萬人的AI隊伍。而國內一線大廠的情況,更是把AI牢牢把握住,試圖再創高峯。
今天本文將分享一篇AI入門實戰的項目經驗分享,手把手帶你進入AI的世界,讓你消除對AI技術壁壘過高的恐懼~
【AI項目實戰】多標籤圖像分類競賽小試牛刀
初次拿到這個題目,想了想做過了貓狗大戰這樣的二分類,也做過cifar-10這樣的多分類,類似本次比賽的題目多標籤圖像分類的確沒有嘗試過。6941個標籤,每張圖片可能沒有標籤也可能存在6941個標籤,即各個標籤之間是不存在互斥關係的,所以最終分類的損失函數不能用softmax而必須要用sigmoid。然後把分類層預測6941個神經元,每個神經元用sigmoid函數返回是否存在某個標籤即可。
來蹚下整個流程看看,在jupyter notebook上做得比較亂,但是整個流程還是可以看出來的。深度學習模型用的Keras。
先導入train_csv數據,這裏用的是最初版的訓練csv文件,img_path裏存在地址,後面做了處理。
code
import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline from glob import glob from tqdm import tqdm import cv2 from PIL import Image train_path = 'visual_china_train.csv' train_df = pd.read_csv(train_path) train_df.head()
code
train_df.shape #(35000, 2)
可以看到總共有35000張訓練圖片,第一列爲圖片名稱(帶地址,需處理),第二列爲圖片對應標籤。
來看下是不是的確只有6941個標籤:
code
tags = [] for i in range(train_df['tags'].shape[0]): for tag in train_df['tags'].iloc[i].split(','): tags.append(tag) tags = set(tags) len(tags) #6941
事實證明標籤總數無誤,可以放心大膽地繼續進行下去了。
然後我處理了下圖片名稱,並存到了img_paths列表裏。
code
#如果使用的是官方後來更新的visual_china_train.csv,可以直接使用最後一行代碼 for i in range(35000): train_df['img_path'].iloc[i] = train_df['img_path'].iloc[i].split('/')[-1] img_paths = list(train_df['img_path'])
定義三個函數,其中:
- hash_tag函數讀入valid_tags.txt文件,並存入字典,形成索引和標籤的對照。
- load_ytrain函數讀入tag_train.npz文件,並返回訓練集的y_train,形式爲ndarray,shape爲(35000, 6941),即35000張圖片和對應標籤的one-hot編碼。
- arr2tag函數將預測結果的y_pred轉變成對應的中文標籤。(實際上最後還需要做下處理)
code
def hash_tag(filepath): fo = open(filepath, "r",encoding='utf-8') hash_tag = {} i = 0 for line in fo.readlines(): #依次讀取每行 line = line.strip() #去掉每行頭尾空白 hash_tag[i] = line i += 1 return hash_tag def load_ytrain(filepath): y_train = np.load(filepath) y_train = y_train['tag_train'] return y_train def arr2tag(arr): tags = [] for i in range(arr.shape[0]): tag = [] index = np.where(arr[i] > 0.5) index = index[0].tolist() tag = [hash_tag[j] for j in index] tags.append(tag) return tags
讀入valid_tags.txt,並生成索引和標籤的映射。
code
filepath = "valid_tags.txt" hash_tag = hash_tag(filepath) hash_tag[1] #'0到1個月'
載入y_train
code
y_train = load_ytrain('tag_train.npz') y_train.shape #(35000, 6941)
前期準備工作差不多做完了,開始導入訓練集。此處有個坑,即原始訓練集中存在CMYK格式的圖片,傳統圖片處理一般爲RGB格式,所以使用Image庫中的convert函數對非RGB格式的圖片進行轉換。
code
nub_train = 5000 #可修改,前期嘗試少量數據驗證模型 X_train = np.zeros((nub_train,224,224,3),dtype=np.uint8) i = 0 for img_path in img_paths[:nub_train]: img = Image.open('train/' + img_path) if img.mode != 'RGB': img = img.convert('RGB') img = img.resize((224,224)) arr = np.asarray(img) X_train[i,:,:,:] = arr i += 1
訓練集導入完成,來看圖片的樣子,判斷下圖片有沒有讀入錯誤之類的問題。
code
fig,axes = plt.subplots(6,6,figsize=(20, 20)) j = 0 for i,img in enumerate(X_train[:36]): axes[i//6,j%6].imshow(img) j+=1
看樣子還不錯,go on! 訓練集的X_train、y_train都拿到了,分割出驗證集。這裏要說明一下,官方的y_train裏圖片名稱與X_train裏圖片名稱是對應的所以可以直接分割。
code
from sklearn.model_selection import train_test_split X_train2,X_val,y_train2,y_val = train_test_split(X_train, y_train[:nub_train], test_size=0.2, random_state=2018)
數據準備完成,開始搭建模型。咳咳,先從簡單的入手哈,此模型仿tinymind上一次的漢字書法識別大賽中“真的學不會”大佬的結構來搭的,又加了些自己的東西,反正簡單模型試試水嘛。
code
from keras.layers import * from keras.models import * from keras.optimizers import * from keras.callbacks import * def bn_prelu(x): x = BatchNormalization()(x) x = PReLU()(x) return x def build_model(out_dims, input_shape=(224, 224, 3)): inputs_dim = Input(input_shape) x = Lambda(lambda x: x / 255.0)(inputs_dim) #在模型裏進行歸一化預處理 x = Conv2D(16, (3, 3), strides=(2, 2), padding='same')(x) x = bn_prelu(x) x = Conv2D(16, (3, 3), strides=(1, 1), padding='same')(x) x = bn_prelu(x) x = MaxPool2D(pool_size=(2, 2))(x) x = Conv2D(32, (3, 3), strides=(1, 1), padding='same')(x) x = bn_prelu(x) x = Conv2D(32, (3, 3), strides=(1, 1), padding='same')(x) x = bn_prelu(x) x = MaxPool2D(pool_size=(2, 2))(x) x = Conv2D(64, (3, 3), strides=(1, 1), padding='same')(x) x = bn_prelu(x) x = MaxPool2D(pool_size=(2, 2))(x) x = Conv2D(128, (3, 3), strides=(1, 1), padding='same')(x) x = bn_prelu(x) x = GlobalAveragePooling2D()(x) dp_1 = Dropout(0.5)(x) fc2 = Dense(out_dims)(dp_1) fc2 = Activation('sigmoid')(fc2) #此處注意,爲sigmoid函數 model = Model(inputs=inputs_dim, outputs=fc2) return model
看下模型結構:
code
model = build_model(6941) model.summary()
_________________________________________________________________Layer (type) Output Shape Param # =================================================================input_1 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________lambda_1 (Lambda) (None, 224, 224, 3) 0 _________________________________________________________________conv2d_1 (Conv2D) (None, 112, 112, 16) 448 _________________________________________________________________batch_normalization_1 (Batch (None, 112, 112, 16) 64 _________________________________________________________________p_re_lu_1 (PReLU) (None, 112, 112, 16) 200704 _________________________________________________________________conv2d_2 (Conv2D) (None, 112, 112, 16) 2320 _________________________________________________________________batch_normalization_2 (Batch (None, 112, 112, 16) 64 _________________________________________________________________p_re_lu_2 (PReLU) (None, 112, 112, 16) 200704 _________________________________________________________________max_pooling2d_1 (MaxPooling2 (None, 56, 56, 16) 0 _________________________________________________________________conv2d_3 (Conv2D) (None, 56, 56, 32) 4640 _________________________________________________________________batch_normalization_3 (Batch (None, 56, 56, 32) 128 _________________________________________________________________p_re_lu_3 (PReLU) (None, 56, 56, 32) 100352 _________________________________________________________________conv2d_4 (Conv2D) (None, 56, 56, 32) 9248 _________________________________________________________________batch_normalization_4 (Batch (None, 56, 56, 32) 128 _________________________________________________________________p_re_lu_4 (PReLU) (None, 56, 56, 32) 100352 _________________________________________________________________max_pooling2d_2 (MaxPooling2 (None, 28, 28, 32) 0 _________________________________________________________________conv2d_5 (Conv2D) (None, 28, 28, 64) 18496 _________________________________________________________________batch_normalization_5 (Batch (None, 28, 28, 64) 256 _________________________________________________________________p_re_lu_5 (PReLU) (None, 28, 28, 64) 50176 _________________________________________________________________max_pooling2d_3 (MaxPooling2 (None, 14, 14, 64) 0 _________________________________________________________________conv2d_6 (Conv2D) (None, 14, 14, 128) 73856 _________________________________________________________________batch_normalization_6 (Batch (None, 14, 14, 128) 512 _________________________________________________________________p_re_lu_6 (PReLU) (None, 14, 14, 128) 25088 _________________________________________________________________global_average_pooling2d_1 ( (None, 128) 0 _________________________________________________________________dropout_1 (Dropout) (None, 128) 0 _________________________________________________________________dense_1 (Dense) (None, 6941) 895389 _________________________________________________________________activation_1 (Activation) (None, 6941) 0 =================================================================Total params: 1,682,925 Trainable params: 1,682,349 Non-trainable params: 576_________________________________________________________________
由於比賽要求裏最終得分標準是fmeasure而不是acc,故網上找來一段代碼用以監測訓練中查準率、查全率、fmeasure的變化。原地址找不到了,故而無法貼上,罪過罪過。
code
import keras.backend as K def precision(y_true, y_pred): # Calculates the precision true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1))) precision = true_positives / (predicted_positives + K.epsilon()) return precision def recall(y_true, y_pred): # Calculates the recall true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) possible_positives = K.sum(K.round(K.clip(y_true, 0, 1))) recall = true_positives / (possible_positives + K.epsilon()) return recall def fbeta_score(y_true, y_pred, beta=1): # Calculates the F score, the weighted harmonic mean of precision and recall. if beta < 0: raise ValueError('The lowest choosable beta is zero (only precision).') # If there are no true positives, fix the F score at 0 like sklearn. if K.sum(K.round(K.clip(y_true, 0, 1))) == 0: return 0 p = precision(y_true, y_pred) r = recall(y_true, y_pred) bb = beta ** 2 fbeta_score = (1 + bb) * (p * r) / (bb * p + r + K.epsilon()) return fbeta_score def fmeasure(y_true, y_pred): # Calculates the f-measure, the harmonic mean of precision and recall. return fbeta_score(y_true, y_pred, beta=1)
這裏稍做圖片增強,用Keras裏的ImageDataGenerator函數,同時還可生成器方法進行訓練。
code
from keras.preprocessing.image import ImageDataGenerator train_datagen = ImageDataGenerator(width_shift_range = 0.1, height_shift_range = 0.1, zoom_range = 0.1) val_datagen = ImageDataGenerator() #驗證集不做圖片增強 batch_size = 8 train_generator = train_datagen.flow(X_train2,y_train2,batch_size=batch_size,shuffle=False) val_generator = val_datagen.flow(X_val,y_val,batch_size=batch_size,shuffle=False)
開始訓練。這裏在ModelCheckpoint裏設置monitor監控feasure,mode爲max,不再以最低loss作爲模型最優的判斷標準(個人做法,好壞可自行實驗判斷)。
code
checkpointer = ModelCheckpoint(filepath='weights_best_simple_model.hdf5', monitor='val_fmeasure',verbose=1, save_best_only=True, mode='max') reduce = ReduceLROnPlateau(monitor='val_fmeasure',factor=0.5,patience=2,verbose=1,min_delta=1e-4,mode='max') model.compile(optimizer = 'adam', loss='binary_crossentropy', metrics=['accuracy',fmeasure,recall,precision]) epochs = 20 history = model.fit_generator(train_generator, validation_data = val_generator, epochs=epochs, callbacks=[checkpointer,reduce], verbose=1)
訓練了20個epoch,這裏給出第20個epoch時的訓練結果,可以看到,val_loss 0.0233,其實已經挺低了;val_acc0.9945,參考意義不大(暫時不清楚有什麼參考意義~~);val_fmeasure0.17,嗯。。任重道遠啊。
訓練了20個epoch,這裏給出第20個epoch時的訓練結果,可以看到,val_loss 0.0233,其實已經挺低了;val_acc0.9945,參考意義不大(暫時不清楚有什麼參考意義~~);val_fmeasure0.17,嗯。。任重道遠啊。
Epoch 20/20500/500 [==============================] - 48s 96ms/step - loss: 0.0233 - acc: 0.9946 - fmeasure: 0.1699 - recall: 0.0970 - precision: 0.7108 - val_loss: 0.0233 - val_acc: 0.9946 - val_fmeasure: 0.1700 - val_recall: 0.0968 - val_precision: 0.7162 Epoch 00020: val_fmeasure did not improve from 0.17148
以上只給出5000張圖片的簡單模型訓練方法,但數據處理,搭建模型以及訓練過程已經很清晰明瞭了,後面的進階之路就憑大家各顯身手了。
然後開始進行預測,導入測試集(當然是在訓練集全部訓練之後再進行測試集的預測)。
code
nub_test = len(glob('valid/*')) X_test = np.zeros((nub_test,224,224,3),dtype=np.uint8) path = [] i = 0 for img_path in tqdm(glob('valid/*')): img = Image.open(img_path) if img.mode != 'RGB': img = img.convert('RGB') img = img.resize((224,224)) arr = np.asarray(img) X_test[i,:,:,:] = arr i += 1 100%|██████████████████████████████████████████████████████████████████████████████| 8000/8000 [02:18<00:00, 57.91it/s]
預測測試集並利用arr2tag函數將結果轉爲中文標籤,以便生成提交文件。
code
y_pred = model.predict(X_test) y_tags = arr2tag(y_pred)
生成提交文件:
code
import os img_name = os.listdir('valid/') img_name[:10]
['000effcf2091ae3895074838b7e5f571186ab362.jpg', '0014455e5fbfd0961039fe23675debbb1a7b2308.jpg', '002138959ee7a14eb2860100392a384f8a85425f.jpg', '002414411ce17c6c7ab5d36dd3f956d0691ba495.jpg', '002780359fda7f09e6d1fc52d88aff90c6e8298b.jpg', '002ad24891ddf815bb86e4eca34415b1b44c9e4b.jpg', '002c284f94299bcee51733f7d6b17f3e4792d8c5.jpg', '002cf4b15887f32b688113a2a7a3f5786896d019.jpg', '003d4c12160b90fbbb2bd034ee30c251a45d9037.jpg', '0043ab4460cc79bfbea3db69d2a55d5f35600a37.jpg']
arr2tag函數得到的每張圖片的標籤是list格式,需轉成str,在這裏操作。經實驗,windows中的方法與ubuntu中不同,後面也給出了ubuntu中本步的處理方法。
code
# windows import pandas as pd df = pd.DataFrame({'img_path':img_name, 'tags':y_tags}) for i in range(df['tags'].shape[0]): df['tags'].iloc[i] = ','.join(str(e) for e in df['tags'].iloc[i]) df.to_csv('submit.csv',index=None) df.head()
code
# #Ubuntu import pandas as pd df = pd.DataFrame({'img_path':img_name, 'tags':y_tags}) for i in range(df['tags'].shape[0]): df['tags'].iloc[i] = df['tags'].iloc[i][2:-2].replace('\'',"").replace('\'',"") df.to_csv('submit.csv',index=None)
整篇到此結束,有幾點要說的:
- 提高方法。不用說,肯定是上預訓練模型,可能再進行模型融合效果會更好。官方大大說整個標籤由於人工標註,可能會跟機器預測出來的有別差,畢竟看預測結果中出現的 “一個人,人,僅一個女人,僅一個青年女人,僅女人,僅成年人” ,如果由人類來標註可能不會這麼囉嗦~~所以可以考慮NLP方法對標籤進行一些處理(我不會)。另外網上查到了個詭異的做法,說可以把fmeasure變成損失函數去訓練模型(fmeasure不可導),我想如果有辦法做到應該效果不錯吧。
- 不足之處。訓練過程中監控fmeasure和監控loss的做法,看上去應該是fmeasure沒錯,不過自己對於這塊研究不夠,只能憑感覺在做,各位看官可自由發揮。
- 整篇文章代碼只有查準率、查全率、fmeasure部分爲網上摘取,其他均爲原創代碼(略有借鑑),其實是想說,代碼可能有些地方稚嫩,還望各位大佬們海涵。