特徵是圖像識別、圖像檢索的關鍵之一。特徵提取對於識別、檢索的效果至關重要,它主要經歷了底層特徵(顏色、紋理、形狀等)提取、局部特徵(SIFT、SURF等)提取、詞頻向量(圖像對圖象集BOW的編碼結果,可以作爲圖像特徵,在局部特徵基礎上進行)提取、深度神經網絡提取幾個過程。雖然在很多場景下深度網絡提取特徵效果較好,現在已經成爲主流,但在特定環境、特定場景下,結合其他技術(空間金字塔、稀疏學習、LBP等)的底層特徵提取可能以低得多的代價取得憂於深度神經網絡的效果。畢竟深度神經網絡的效果需要對參數進行調優、對結構進行調整,直接達到最好效果是非常困難的。
這裏介紹一個開源的圖像識別工具——reco_toolbox,作者給它的名字是Scenes/Objects classification toolbox,採用的數據集是scene_15、skinan等。該工具箱提供了多種特徵提取、特徵處理、詞典學習等功能,並可以自由組合。還採用了稀疏學習、空間金字塔池化、LBP、Fast k-means等技術。它的主要函數有:
A) Patch functions denseCOLOR Compute histogram of color projection on a regular dense grid denseMBLBP Extract Histogram of Multi-Block LBP on a regular dense grid and computed on image I after color projection denseMBLDP Extract Histogram of Multi-Block LDP on a regular dense grid and computed on image I after color projection densePATCH Extract patches of pixels after color projection on a regular dense grid denseSIFT Compute SIFT (Scale-invariant feature transform) descriptors on a regular dense grid B) Direct descriptors mlhmslbp_spyr Color Multi-Level Histogram of Multi-Scale Local Binary Pattern with Spatial Pyramid mlhmsldp_spyr Color Multi-Level Histogram of Multi-Scale Local Derivative Pattern with Spatial Pyramid mlhmslsd_spyr Color Multi-Level Histogram of Multi-Scale Line Segment Detector with Spatial Pyramid mlhoee_spyr Color Multi-Level Histogram of Oriented Edge Energy with Spatial Pyramid C) Dictionary learning yael_kmeans Fast K-means algorithm to learn codebook mexTrainDL Sparse dictionary learning algoritm mexTrainDL_Memory Faster Sparse dictionary learning algoritm but more memory consumming mexLasso Lasso algorithm to compute alpha s weights D) Spatial pyramidal pooling mlhbow_spyr Histogram of color visual words with a Multi-Level Spatial pyramid dl_spyr Pooling with a multi-Level spatial pyramid mlhlcc_spyr Pooling with a multi-Level spatial pyramid and Locality-Constraint Linear Coding E) Classifiers homker_pegasos_train Pegasos solver with Homeogeneous additive kernel transform included homker_predict Predict new instances with trained model train_dense Liblinear training algorithm for dense data svmtrain Train SVM model via Libsvm for dense data svmpredict Predict new instances with trained model pegasos_train Pegasos solver predict_dense Liblinear predict algorithm for dense data
從上述函數中可以看出,該工具箱已經包括空間金字塔、稀疏學習、BOW詞典構建等功能。基本上代表了深度學習應用在圖像識別以前,最好的圖像特徵提取和識別方法。即使底層特徵,如mlhmslbp_spyr,也能超越部分深度神經網絡的達到的識別效果。具體功能可參看其Readme.txt文檔.
該工具箱底層代碼用c語言編寫,執行效率高,內存消耗小。作者提供了在64位和32位Windows上的mexw編譯文件,如不能運行可在本機使用mex命令重新編譯。若未添加路徑,可將用到的mexw64或mexw32文件解壓並複製至與執行的m文件同級的目錄。
下面簡單介紹演示程序simple_train.m的部分代碼。由於該文件太長,只給出了一部分。在工具箱中運行這個文件前,要先運行extract_bag_of_features.m或extract_direct_features.m以提取特徵並在指定路徑下存爲文件。同時需要在該文件設置對應的choice_descriptors ,否則會提示找不到特徵文件。
clc,close all, clear ,drawnow
database_name = {'scenes15' , 'skinan' , 'agemen'};
database_ext = {'jpg' , 'jpg' , 'png'};
descriptors_name = {'denseSIFT_mlhbow_spyr' , 'denseSIFT_dl_spyr' , 'denseSIFT_mlhlcc_spyr' ,...
'denseCOLOR_mlhbow_spyr' , 'denseCOLOR_dl_spyr' , 'denseCOLOR_mlhlcc_spyr' , ...
'densePATCH_mlhbow_spyr' , 'densePATCH_dl_spyr' , 'densePATCH_mlhlcc_spyr' , ...
'denseMBLBP_mlhbow_spyr' , 'denseMBLBP_dl_spyr' , 'denseMBLBP_mlhlcc_spyr' , ...
'denseMBLDP_mlhbow_spyr' , 'denseMBLDP_dl_spyr' , 'denseMBLDP_mlhlcc_spyr' , ...
'mlhoee_spyr' , 'mlhmslsd_spyr' , 'mlhmslbp_spyr' , 'mlhmsldp_spyr'};
classifier = {'liblinear' , 'pegasos' , 'libsvm'};
%用數字表示對應位置的圖像集,scenes15=1/skinan=2/agemen=3
choice_database = [1];
%用數字表示對應位置的特徵,8表示densePATCH_dl_spyr
choice_descriptors = [8];
%用數字表示對應位置的分類器,Libnear=1/Pegasos=2/Libsvm=3
choice_classifier = [1];
data_name = database_name{choice_database(1)};
im_ext = database_ext{choice_database(1)};
%獲取當前路徑、圖像集路徑、核心代碼路徑、特徵文件存放路徑、模型存放路徑
rootbase_dir = pwd;
images_dir = fullfile(pwd , 'images' , data_name);
core_dir = fullfile(pwd , 'core');
feat_dir = fullfile(pwd , 'features');
models_dir = fullfile(pwd , 'models');
addpath(core_dir)
%遍歷圖像集目錄
dir_image = dir(images_dir);
%圖像類別數目,即子文件夾數目
nb_topic = length(dir_image) - 2;
%每一類圖像的名稱
classe_name = cellstr(char(dir_image(3:nb_topic+2).name));
%設置訓練參數,K表示K-折交叉驗證
K = 1;
seed_value = 5489;
post_norm = 0;
do_weightinglearning = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1];
uselogic = 1;
fusion_method = 1; %max=0/mean=1
nbin = 100; % for ROC curves
%聲明存放圖像數的數組,元素數爲圖像類別數目
nb_images_per_topic = zeros(1 , nb_topic);
%聲明類別標籤數組
y = [];
for i = 1:nb_topic
nb_images_per_topic(i) = length(dir(fullfile(pwd , 'images' , data_name , dir_image(i+2).name , ['*.' , im_ext])));
y = [y , i*ones(1 , nb_images_per_topic(i))];
end
%圖像總數
N = sum(nb_images_per_topic);
%執行config_databases.m文件,設置相關配置信息
config_databases;
%執行腳本文件,文件名爲[data_name , '_config_classifier']
eval([data_name , '_config_classifier']);
%%
s = RandStream.create('mt19937ar','seed',seed_value);
RandStream.setDefaultStream(s);
%設置訓練圖像、測試圖像存儲變量
Itrain = zeros(K , sum(base{choice_database}.maxperclasstrain));
Itest = zeros(K , sum(base{choice_database}.maxperclasstest));
for j = 1:K
cotrain = 1;
cotest = 1;
for i = 1:nb_topic
%將標籤轉換爲只含0、1的矩陣
indi = find(y==i);
%對圖像序號進行置亂
tempind = randperm(nb_images_per_topic(i));
%按設定數目取訓練圖像序號
indtrain = tempind(1:base{choice_database}.maxperclasstrain(i));
indtest = tempind(base{choice_database}.maxperclasstrain(i)+1:base{choice_database}.maxperclasstrain(i)+base{choice_database}.maxperclasstest(i));
%取每折訓練圖像序號
Itrain(j , cotrain:cotrain+base{choice_database}.maxperclasstrain(i)-1) = indi(indtrain);
%取每折測試圖像序號
Itest(j , cotest:cotest+base{choice_database}.maxperclasstest(i)-1) = indi(indtest);
cotrain = cotrain + base{choice_database}.maxperclasstrain(i);
cotest = cotest + base{choice_database}.maxperclasstest(i);
end
end
%逐個特徵進行訓練
for d = 1 : nb_descriptors
cdescriptors = choice_descriptors(d);
base_descriptor = descriptors_name{cdescriptors};
%逐個分類器進行訓練
for c = 1:nb_classifier
ccurrent = choice_classifier(c);
base_classifier = classifier{ccurrent};
base_name = [data_name , '_' , base_descriptor];
base_name_model = [data_name , '_' ,base_descriptor , '_' , base_classifier];
fprintf('\nLoad descriptor %s for classifier = %s\n\n' , base_name , base_classifier );
drawnow
clear X y
%加載特徵文件
load(fullfile(feat_dir , base_name ));
%對特徵進行1-norm或2-norm變換
if(post_norm == 1)
sumX = sum(X , 1) + 10e-8;
X = X./sumX(ones(size(X , 1) , 1) , :);
end
if(post_norm == 2)
sumX = sum(X , 1);
temp = sqrt(sumX.*sumX + 10e-8);
X = X./temp(ones(size(X , 1) , 1) , :);
end
if(param_classif{cdescriptors,ccurrent}.n > 0)
fprintf('Homoegeous Feature Kernel Map with n = %d, L = %4.2f, kernel = %d\n\n', param_classif{cdescriptors,ccurrent}.n , param_classif{cdescriptors,ccurrent}.L , param_classif{cdescriptors,ccurrent}.kerneltype);
drawnow
X = homkermap(X , param_classif{cdescriptors,ccurrent});
end
for k = 1:K
%按序號取出訓練圖像
Xtrain = X(: , Itrain(k , :));
%按序號取出訓練圖像標籤
ytrain = y(Itrain(k , :));
fprintf('\nLearn train data for classifier = %s and descriptor = %s\n\n' , base_classifier , base_name);
drawnow
for t = 1:nb_topic
ind_topic = (ytrain==t);
ytopic = double(ind_topic);
ytopic(ytopic==0) = -1;
if((strcmp(base_classifier , 'liblinear')) )
fprintf('cv = %d/%d, learn topic = %s (%d/%d), h1 = %10.5f for classifier = %s and descriptor = %s \n' , k , K , classe_name{t} , t , nb_topic , param_classif{cdescriptors,ccurrent}.c , base_classifier , base_descriptor)
drawnow
if(do_weightinglearning(c))
npos = sum(ytopic==1);
nneg = length(ytopic) - npos;
wpos = nneg/npos;
options = ['-q -s ' num2str(param_classif{cdescriptors,ccurrent}.s) ' -B ' num2str(param_classif{cdescriptors,ccurrent}.B) ' -w1 ' num2str(wpos) ' -c ' num2str(param_classif{cdescriptors,ccurrent}.c)];
else
options = ['-q -s ' num2str(param_classif{cdescriptors,ccurrent}.s) ' -B ' num2str(param_classif{cdescriptors,ccurrent}.B) ' -c ' num2str(param_classif{cdescriptors,ccurrent}.c)];
end
%訓練第t類圖像的分類模型
model{t} = train_dense(ytopic' , Xtrain , options , 'col');
%對訓練圖像進行預測
[ytopic_est , accuracy_test , ftopic] = predict_dense(ytopic' , Xtrain , model{t} , '-b 0' , 'col'); % test the training data
if(uselogic)
options = ['-q -s 0 -B ' num2str(param_classif{cdescriptors,ccurrent}.B) ' -c ' num2str(param_classif{cdescriptors,ccurrent}.c)];
model{t}.logist = train_dense(ytopic' , ftopic' , options , 'col');
else
[A , B] = sigmoid_train(ytopic , ftopic');
ptopic = sigmoid_predict(ftopic' , A , B);
model{t}.A = A;
model{t}.B = B;
end
end
clear Xtrain ytrain;
Xtest = X(: , Itest(k , :));
ytest = y(Itest(k , :));
fprintf('\nPredict test data for classifier = %s and descriptor = %s\n\n' , base_classifier , base_name);
drawnow
for t = 1:nb_topic
ind_topic = (ytest==t);
ytopic = double(ind_topic);
ytopic(ytopic==0) = -1;
fprintf('cv = %d, predict topic = %s (%d/%d) for classifier = %s and descriptor = %s\n' , k , classe_name{t} , t , nb_topic , base_classifier , base_descriptor);
drawnow
if((strcmp(base_classifier , 'liblinear')) )
[ytopic_est , accuracy_test , ftopic ] = predict_dense(ytopic' , Xtest , model{t} , '-b 0' , 'col'); % test the training data
if(uselogic)
[l2,a2,d2] = predict_dense(ytopic' , ftopic , model{t}.logist , '-b 1');
ptopic = d2(:,find(model{t}.logist.Label==1))';
else
ptopic = sigmoid_predict(ftopic' , model{t}.A , model{t}.B);
end
end