行人重識別數據集轉換--統一爲market1501數據集進行多數據集聯合訓練

https://www.codetd.com/article/10334372

行人reid 常用數據集以及轉換成market1501感謝作者,

行人重識別數據集轉換--統一爲market1501數據集進行多數據集聯合訓練

                </div>
                <div class="ga-title">
                    <script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
                    <ins class="adsbygoogle" style="display: block; text-align: center; height: 200px;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-3954339303763081" data-ad-slot="3382084344" data-adsbygoogle-status="done"><ins id="aswift_0_expand" style="display:inline-table;border:none;height:200px;margin:0;padding:0;position:relative;visibility:visible;width:1020px;background-color:transparent;"><ins id="aswift_0_anchor" style="display:block;border:none;height:200px;margin:0;padding:0;position:relative;visibility:visible;width:1020px;background-color:transparent;"><iframe id="aswift_0" name="aswift_0" style="left:0;position:absolute;top:0;border:0;width:1020px;height:200px;" src="https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-3954339303763081&amp;output=html&amp;h=200&amp;slotname=3382084344&amp;adk=581152618&amp;adf=3102257025&amp;w=1020&amp;fwrn=4&amp;lmt=1591581465&amp;rafmt=11&amp;psa=1&amp;guci=2.2.0.0.2.2.0.0&amp;format=1020x200&amp;url=https%3A%2F%2Fwww.codetd.com%2Farticle%2F10334372&amp;flash=0&amp;wgl=1&amp;adsid=NT&amp;dt=1591581464685&amp;bpp=12&amp;bdt=190&amp;idt=248&amp;shv=r20200602&amp;cbv=r20190131&amp;ptt=9&amp;saldr=aa&amp;abxe=1&amp;cookie=ID%3Dff08413d6d2dba61%3AT%3D1591581228%3AS%3DALNI_MbShZ10mGgp4yAIyk5AH384StJGmA&amp;crv=1&amp;correlator=4256451388552&amp;frm=20&amp;pv=2&amp;ga_vid=1349411240.1589325260&amp;ga_sid=1591581465&amp;ga_hid=1157620213&amp;ga_fc=0&amp;icsg=2228010&amp;dssz=21&amp;mdo=0&amp;mso=8&amp;rplot=4&amp;u_tz=480&amp;u_his=15&amp;u_java=0&amp;u_h=1080&amp;u_w=1920&amp;u_ah=1040&amp;u_aw=1920&amp;u_cd=24&amp;u_nplug=0&amp;u_nmime=0&amp;adx=84&amp;ady=214&amp;biw=1537&amp;bih=908&amp;scr_x=0&amp;scr_y=1008&amp;eid=21065532%2C42530452%2C42530454&amp;oid=3&amp;pvsid=4456014601380787&amp;pem=247&amp;rx=0&amp;eae=0&amp;fc=896&amp;brdim=218%2C85%2C218%2C85%2C1920%2C0%2C1566%2C1016%2C1554%2C908&amp;vis=1&amp;rsz=%7C%7CpeE%7C&amp;abl=CS&amp;pfx=0&amp;fu=8344&amp;bc=29&amp;ifi=1&amp;uci=a!1&amp;xpc=XSTzIHOCkA&amp;p=https%3A//www.codetd.com&amp;dtd=333" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" data-google-container-id="a!1" data-load-complete="true" data-google-query-id="CLOgt_-O8ekCFRAZKgoduf4JKg" width="1020" height="200" frameborder="0"></iframe></ins></ins></ins>
                    <script>
                        (adsbygoogle = window.adsbygoogle || []).push({});
                    </script>
                </div>

                <div class="article-content"><link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-833878f763.css"> 

0 前言

  常用的reID數據集如圖所示

下載好數據集,我的數據集一開始是這樣的

第一步 創建出來market1501的數據集文件夾格式

market1501數據集的具體介紹可以看看這個 http://blog.fangchengjin.cn/reid-market-1501.html

import os

def make_market_dir(dst_dir=’./’):
market_root = os.path.join(dst_dir, ‘market1501’)
train_path = os.path.join(market_root, ‘bounding_box_train’)
query_path = os.path.join(market_root, ‘query’)
test_path = os.path.join(market_root, ‘bounding_box_test’)

<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(train_path):
    os.makedirs(train_path)
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(query_path):
    os.makedirs(query_path)
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(test_path):
    os.makedirs(test_path)

if name == main:
make_market_dir(dst_dir=‘E:/reID’)

這樣就創建出來我們需要的幾個文件夾了

第二步 將market1501數據集抽取出來

import re
import os
import shutil

def extract_market(src_path, dst_dir):
img_names = os.listdir(src_path)
pattern = re.compile(r’([-\d]+)_c(\d)’)
pid_container = set()
for img_name in img_names:
if ‘.jpg’ not in img_name:
continue
print(img_name)
# pid: 每個人的標籤編號 1
# _ : 攝像頭號 2
pid, _ = map(int, pattern.search(img_name).groups())
# 去掉沒用的圖片
if pid == 0 or pid == -1:
continue
shutil.copy(os.path.join(src_path, img_name), os.path.join(dst_dir, img_name))

if name == main:
src_train_path = r’D:\data\market1501\bounding_box_train’
src_query_path = r’D:\data\market1501\query’
src_test_path = r’D:\data\market1501\bounding_box_test’
# 將整個market1501數據集作爲訓練集
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_market(src_train_path, dst_dir)
extract_market(src_query_path, dst_dir)
extract_market(src_test_path, dst_dir)

抽取的結果如圖所示,現在一共有 29419 張圖片, ID從0001到1501一共1501 個不同ID的行人。

第三步 將CUHK數據集抽取出來

具體介紹看這個:http://blog.fangchengjin.cn/reid-cuhk03.html

import glob
import re
import os.path as osp
import shutil

import re
import os
import shutil

def extract_cuhk03(src_path, dst_dir):
img_names = os.listdir(src_path)
pattern = re.compile(r’([-\d]+)c(\d)([\d]+)’)
pid_container = set()
for img_name in img_names:
if ‘.png’ not in img_name and ‘.jpg’ not in img_name:
continue
print(img_name)
# pid: 每個人的標籤編號 1
# camid : 攝像頭號 2
pid, camid, fname = map(int, pattern.search(img_name).groups())
# 這裏注意需要加上前面的market1501數據集的最後一個ID 1501
# 在前面數據集的最後那個ID基礎上繼續往後排
pid += 1501
dst_img_name = str(pid).zfill(6) + ‘_c’ + str(camid) + ‘_CUHK’ + str(fname) + ‘.jpg’
shutil.copy(os.path.join(src_path, img_name), os.path.join(dst_dir, dst_img_name))

if name == main:
src_train_path = r’D:\data\cuhk03-np\detected\bounding_box_train’
src_query_path = r’D:\data\cuhk03-np\detected\query’
src_test_path = r’D:\data\cuhk03-np\detected\bounding_box_test’
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_cuhk03(src_train_path, dst_dir)
extract_cuhk03(src_query_path, dst_dir)
extract_cuhk03(src_test_path, dst_dir)

轉換結果如圖所示,CUHK03一共有 14097 張圖片, ID從001502到002968一共1467個不同ID的行人。

第四步 將MSMT17數據集抽取出來

import glob
import re
import os.path as osp
import shutil

def msmt2market(dir_path, list_path, dst_dir, prev_pid):
with open(list_path, ‘r’) as txt:
lines = txt.readlines()
pid_container = set()
for img_idx, img_info in enumerate(lines):
img_path, pid = img_info.split(’ ‘)
pid = int(pid) + prev_pid + 1 # 2969 5121
camid = int(img_path.split(’_’)[2])
img_path = osp.join(dir_path, img_path)
name = img_path.split(’/’)[-1] # ‘0001_c2_f0046182.jpg’
Newdir = osp.join(dst_dir, str(pid).zfill(6) + c’ + str(camid) + '’ + name) # 用字符串函數zfill 以0補全所需位數
shutil.copy(img_path, Newdir) # 複製一個文件到一個文件或一個目錄

<span class="hljs-comment"># check if pid starts from 0 and increments with 1</span>
<span class="hljs-keyword">for</span> idx, pid <span class="hljs-keyword">in</span> enumerate(pid_container):
    <span class="hljs-keyword">assert</span> idx == pid, <span class="hljs-string">"See code comment for explanation"</span>

if name == main:
dataset_dir = r’D:\data\MSMT17_V2’
train_dir = osp.join(dataset_dir, ‘mask_train_v2’)
test_dir = osp.join(dataset_dir, ‘mask_test_v2’)
list_train_path = osp.join(dataset_dir, ‘list_train.txt’)
list_val_path = osp.join(dataset_dir, ‘list_val.txt’)
list_query_path = osp.join(dataset_dir, ‘list_query.txt’)
list_gallery_path = osp.join(dataset_dir, ‘list_gallery.txt’)

dst_dir = <span class="hljs-string">r'E:\reID\market1501\bounding_box_train'</span>
msmt2market(train_dir, list_train_path, dst_dir, <span class="hljs-number">2968</span>)
msmt2market(train_dir, list_val_path, dst_dir, <span class="hljs-number">2968</span>)
msmt2market(test_dir, list_query_path, dst_dir, <span class="hljs-number">4009</span>)
msmt2market(test_dir, list_gallery_path, dst_dir, <span class="hljs-number">4009</span>)

轉換結果如圖所示,MSMT17一共有 126441 張圖片, ID從002969到007069一共1467個不同ID的行人。

到現在以及完成了除了duke以外的幾個大型主流數據集的轉換,duke數據集想留作測試,體現出模型的泛化能力。

目前的統計結果如下圖所示,訓練集現在已經有將近17w的圖片,ID一共有7069個。

第五步 將viper數據集抽取出來

import re
import os
import shutil

def extract_viper(src_path, dst_dir, camid=1):
img_names = os.listdir(src_path)
pattern = re.compile(r’([\d]+)_([\d]+)’)
pid_container = set()
for img_name in img_names:
if ‘.bmp’ not in img_name:
continue
print(img_name)
pid, fname = map(int, pattern.search(img_name).groups())
# 這裏注意需要加上前面的數據集的最後一個ID 7069
# 由於viper數據集ID是從0開始,因此需要+1
pid += 7069 + 1
dst_img_name = str(pid).zfill(6) + ‘_c’ + str(camid) + ‘_viper’ + str(fname) + ‘.jpg’
shutil.copy(os.path.join(src_path, img_name), os.path.join(dst_dir, dst_img_name))

if name == main:
src_cam_a = r’D:\data\viper\cam_a’
src_cam_b = r’D:\data\viper\cam_b’
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_viper(src_cam_a, dst_dir, camid=<span class="hljs-number">1</span>)
extract_viper(src_cam_b, dst_dir, camid=<span class="hljs-number">2</span>)

轉換後的viper數據集一共有1264張圖片, ID從007070到007943一共1467個不同ID的行人。需要注意這裏ID不是連續的,不過只要ID跟之前不重複即可。

第六步 將SenseReID數據集抽取出來

import re
import os
import shutil

def extract_SenseReID(src_path, dst_dir, fname):
img_names = os.listdir(src_path)
pattern = re.compile(r’([\d]+)_([\d]+)’)
pid_container = set()
for img_name in img_names:
if ‘.jpg’ not in img_name:
continue
print(img_name)
pid, camid = map(int, pattern.search(img_name).groups())
pid += 7943 + 1
dst_img_name = str(pid).zfill(6) + ‘_c’ + str(camid + 1) + SenseReID + fname + ‘.jpg’
shutil.copy(os.path.join(src_path, img_name), os.path.join(dst_dir, dst_img_name))

if name == main:
src_cam_a = r’D:\data\SenseReID\test_gallery’
src_cam_b = r’D:\data\SenseReID\test_probe’
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_SenseReID(src_cam_a, dst_dir, <span class="hljs-string">'gallery'</span>)
extract_SenseReID(src_cam_b, dst_dir, <span class="hljs-string">'probe'</span>)</code></pre> 

轉換後的SenseReID數據集一共有4428張圖片, ID從007944到009661。

第七步 將prid數據集抽取出來

import re
import os
import shutil

def extract_prid(src_path, dst_dir, prevID, camid=1):
pattern = re.compile(r’person_([\d]+)’)
pid_container = set()

sub_dir_names = os.listdir(src_path) <span class="hljs-comment"># ['person_0001', 'person_0002',...</span>

<span class="hljs-keyword">for</span> sub_dir_name <span class="hljs-keyword">in</span> sub_dir_names: <span class="hljs-comment"># 'person_0001'</span>
    img_names_all = os.listdir(os.path.join(src_path, sub_dir_name))
    <span class="hljs-comment"># 這裏我就只取首尾兩張,防止重複太多了</span>
    img_names = [img_names_all[<span class="hljs-number">0</span>], img_names_all[<span class="hljs-number">-1</span>]]
    <span class="hljs-keyword">for</span> img_name <span class="hljs-keyword">in</span> img_names: <span class="hljs-comment"># '0001.png'</span>
        <span class="hljs-keyword">if</span> <span class="hljs-string">'.png'</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> img_name:
            <span class="hljs-keyword">continue</span>
        print(img_name)
        <span class="hljs-comment"># parent.split('\\')[-1] : person_0001</span>
        pid = int(pattern.search(sub_dir_name).group(<span class="hljs-number">1</span>))
        pid += prevID
        dst_img_name = str(pid).zfill(<span class="hljs-number">6</span>) + <span class="hljs-string">'_c'</span> + str(camid) + <span class="hljs-string">'_prid'</span> + img_name.replace(<span class="hljs-string">'.png'</span>, <span class="hljs-string">'.jpg'</span>)
        shutil.copy(os.path.join(src_path, sub_dir_name, img_name), os.path.join(dst_dir, dst_img_name))

if name == main:
src_cam_a = r’D:\data\prid2011\multi_shot\cam_a’
src_cam_b = r’D:\data\prid2011\multi_shot\cam_b’
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_prid(src_cam_a, dst_dir, <span class="hljs-number">9661</span>)
extract_prid(src_cam_b, dst_dir, <span class="hljs-number">10046</span>)

轉換後的prid數據集一共有2268張圖片, ID從009662到010795。

第八步 將ilids數據集抽取出來

import re
import os
import shutil

def extract_ilids(src_path, dst_dir, prevID, camid):
pattern = re.compile(r’person([\d]+)’)
pid_container = set()

sub_dir_names = os.listdir(src_path)

<span class="hljs-keyword">for</span> sub_dir_name <span class="hljs-keyword">in</span> sub_dir_names:
    img_names = os.listdir(os.path.join(src_path, sub_dir_name))
    <span class="hljs-keyword">for</span> img_name <span class="hljs-keyword">in</span> img_names:
        <span class="hljs-keyword">if</span> <span class="hljs-string">'.png'</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> img_name:
            <span class="hljs-keyword">continue</span>
        print(img_name)
        pid = int(pattern.search(sub_dir_name).group(<span class="hljs-number">1</span>))
        pid += prevID
        dst_img_name = str(pid).zfill(<span class="hljs-number">6</span>) + <span class="hljs-string">'_c'</span> + str(camid) + <span class="hljs-string">'_ilids'</span> + <span class="hljs-string">'.jpg'</span>
        shutil.copy(os.path.join(src_path, sub_dir_name, img_name), os.path.join(dst_dir, dst_img_name))

if name == main:
src_cam_a = r’D:\data\ilids\i-LIDS-VID\images\cam1’
src_cam_b = r’D:\data\ilids\i-LIDS-VID\images\cam2’
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_ilids(src_cam_a, dst_dir, <span class="hljs-number">10795</span>, <span class="hljs-number">1</span>)
extract_ilids(src_cam_b, dst_dir, <span class="hljs-number">10795</span>, <span class="hljs-number">2</span>)

轉換後的ilids數據集一共有600張圖片, ID從010796到011114。

第九步 將grid數據集抽取出來

import re
import os
import shutil

def extract_grid(src_path, dst_dir, camid=1):
img_names = os.listdir(src_path)
pattern = re.compile(r’([\d]+)_’)
pid_container = set()
for img_name in img_names:
if ‘.jpeg’ not in img_name:
continue
print(img_name)
pid = int(pattern.search(img_name).group(1))
if pid == 0:
continue
pid += 11114
dst_img_name = str(pid).zfill(6) + ‘_c’ + str(camid) + ‘_grid’ + ‘.jpg’
shutil.copy(os.path.join(src_path, img_name), os.path.join(dst_dir, dst_img_name))

if name == main:
src_cam_a = r’D:\data\grid\probe’
src_cam_b = r’D:\data\grid\gallery’
dst_dir = r’E:\reID\market1501\bounding_box_train’

extract_grid(src_cam_a, dst_dir, camid=<span class="hljs-number">1</span>)
extract_grid(src_cam_b, dst_dir, camid=<span class="hljs-number">2</span>)

轉換後的grid數據集一共有500張圖片, ID從011115到011364

最終的數據集統計結果如下圖所示,一共有將近十八萬張圖片和11103個不同ID的行人:

猜你喜歡

轉載自blog.csdn.net/songwsx/article/details/102987787
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章