背景:爲了更改運行通相應代碼,需要先在小數據集上運行代碼方便實現。
目的:在小數據集上運行代碼。
目錄
一、生成訓練集代碼
dataset_tool_tf.py此代碼用於將原始數據集生成tfrecord格式的文件給網絡訓練。
1.1 命令行與輸入參數
相應命令行:
# This should run through roughly 50K images and output a file called `datasets/imagenet_val_raw.tfrecords`.
python dataset_tool_tf.py
--input-dir "<path_to_imagenet>/ILSVRC2012_img_val"
--out=datasets/imagenet_val_raw.tfrecords
或者
python dataset_tool_tf.py
--input-dir datasets/BSDS300-images/BSDS300/images/train
--out=datasets/bsd300.tfrecords
命令行涉及兩個參數,--input-dir輸入的數據集的位置 ,--out輸出的tfrecords格式文檔的位置。
def main():
parser = argparse.ArgumentParser(
description='Convert a set of image files into a TensorFlow tfrecords training set.',
epilog=examples,
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument("--input-dir", help="Directory containing ImageNet images")
parser.add_argument("--out", help="Filename of the output tfrecords file")
args = parser.parse_args()
if args.input_dir is None:
print ('Must specify input file directory with --input-dir')
sys.exit(1)
if args.out is None:
print ('Must specify output filename with --out')
sys.exit(1)
1.2 輸入圖像
print ('Loading image list from %s' % args.input_dir)
images = sorted(glob.glob(os.path.join(args.input_dir, '*.JPEG')))
images += sorted(glob.glob(os.path.join(args.input_dir, '*.jpg')))
images += sorted(glob.glob(os.path.join(args.input_dir, '*.png')))
np.random.RandomState(0x1234f00d).shuffle(images)
所有JPEG jpg png格式的圖片存入images
1.3 轉換與存入
#------Convert the data into tfrecords--------------------------
outdir = os.path.dirname(args.out)
os.makedirs(outdir, exist_ok=True)
writer = tf.python_io.TFRecordWriter(args.out)
for (idx, imgname) in enumerate(images):
print (idx, imgname)
image = load_image(imgname)
feature = {
'shape': shape_feature(image.shape),
'data': bytes_feature(tf.compat.as_bytes(image.tostring()))
}
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
1.4 輸出相關信息
print ('Dataset statistics:')
print (' Formats:')
for key in format_stats:
print (' %s: %d images' % (key, format_stats[key]))
print (' width,height buckets:')
for key in size_stats:
print (' %s: %d images' % (key, size_stats[key]))
writer.close()
根據此信息得出,只要圖像在相應文件夾之中,多少等等並不重要。我們可以將相應的數據集改小。
二、一個版本原因的錯誤(已調通可不看)
BSD300文件夾拷出來,然後train文件夾刪除部分照片,留20張,test文件夾留10張。
部分圖片生成tfrecord的命令行:
python dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
所有圖片生成的代碼:
python dataset_tool_tf.py --input-dir datasets/BSDS300/images/train --out=datasets/bsd300.tfrecords
2.1 版本導致的報錯
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
Loading image list from datasets/part_BSDS300/images/train
Traceback (most recent call last):
File "dataset_tool_tf.py", line 94, in <module>
main()
File "dataset_tool_tf.py", line 70, in main
os.makedirs(outdir, exist_ok=True)
TypeError: makedirs() got an unexpected keyword argument 'exist_ok'
出現此錯誤,原來作者要求版本爲python 3.6 ,這樣報錯的版本爲python 2.7.6
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
需要按照要求在服務器上安裝Anaconda並且配置環境
python3的版本輸入的時候,報錯爲tensorflow配置不對。
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python3 dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
Traceback (most recent call last):
File "dataset_tool_tf.py", line 12, in <module>
import tensorflow as tf
ImportError: No module named 'tensorflow'
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python3
Python 3.5.2 (default, May 23 2017, 10:15:40)
[GCC 5.4.1 20160904] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
三、生成訓練集
dataset_tool_tf.py此代碼用於將圖片生成訓練集,供後面網絡訓練使用。
3.1 訓練集與測試集的更改
BSD300文件夾拷出來,然後train文件夾刪除部分照片,留10張,test文件夾留5張。
但是,這樣相應的文件夾/datasets/part_BSDS300/之中的文件
iids_test.txt和iids_train.txt中的名稱暫時沒有更改。
3.2 命令行
部分圖片生成tfrecord的命令行:
python dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
所有圖片生成的代碼(暫時不用):
python dataset_tool_tf.py --input-dir datasets/BSDS300/images/train --out=datasets/bsd300.tfrecords
只運用部分生成相應的數據集後:
。。。
17 datasets/part_BSDS300/images/train/43083.jpg
18 datasets/part_BSDS300/images/train/60079.jpg
19 datasets/part_BSDS300/images/train/16052.jpg
Dataset statistics:
Formats:
RGB: 20 images
width,height buckets:
>= 256x256: 20 images
3.3 validate set下載
命令行
python download_kodak.py --output-dir=datasets/kodak
代碼
import os
import sys
import argparse
from urllib.request import urlretrieve
examples='''examples:
python %(prog)s --output-dir=./tmp
'''
def main():
parser = argparse.ArgumentParser(
description='Download the Kodak dataset .PNG image files.',
epilog=examples,
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument("--output-dir", help="Directory where to save the Kodak dataset .PNGs")
args = parser.parse_args()
if args.output_dir is None:
print ('Must specify output directory where to store tfrecords with --output-dir')
sys.exit(1)
os.makedirs(args.output_dir, exist_ok=True)
for i in range(1, 25):
imgname = 'kodim%02d.png' % i
url = "http://r0k.us/graphics/kodak/kodak/" + imgname
print ('Downloading', url)
urlretrieve(url, os.path.join(args.output_dir, imgname))
print ('Kodak validation set successfully downloaded.')
if __name__ == "__main__":
main()
運行結果,只有24張
Downloading http://r0k.us/graphics/kodak/kodak/kodim22.png
Downloading http://r0k.us/graphics/kodak/kodak/kodim23.png
Downloading http://r0k.us/graphics/kodak/kodak/kodim24.png
Kodak validation set successfully downloaded.
四、訓練
4.1 網絡訓練
注意:一定要按照要求配置好相應的CUDA驅動版本,CUDA運行版本,TF版本等等。
虛擬環境中用anaconda安裝顯卡CUDA驅動與CUDA運行版本匹配 https://blog.csdn.net/weixin_36474809/article/details/87820314
運用config.py函數進行訓練,相應參數幫助爲:
(n2n) jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python config.py train --help
usage: config.py train [-h] [--noise2noise [NOISE2NOISE]] [--noise NOISE]
[--long-train LONG_TRAIN]
[--train-tfrecords TRAIN_TFRECORDS]
optional arguments:
-h, --help show this help message and exit
--noise2noise [NOISE2NOISE]
Noise2noise (--noise2noise=true) or noise2clean
(--noise2noise=false). Default is noise2noise=true.
--noise NOISE Type of noise corruption (one of: gaussian, poisson)
--long-train LONG_TRAIN
Train for a very long time (500k iterations or
500k*minibatch image)
--train-tfrecords TRAIN_TFRECORDS
Filename of the training set tfrecords file
相應命令行
對於imagenet而言,相應的命令行爲:
python config.py --desc='-test' train --train-tfrecords=datasets/imagenet_val_raw.tfrecords --long-train=true --noise=gaussian
後面查看源碼看--desc=‘-test’ 表示什麼,--train-tfrecords後面的表示訓練集。
python config.py --desc='-test' train --train-tfrecords=datasets/part_bsd300.tfrecords --long-train=false --noise=gaussian
python config.py train --train-tfrecords=datasets/part_bsd300.tfrecords --noise=gaussian
Building TensorFlow graph...
Training...
Average PSNR: 6.56
iter 0 time 4s sec/eval 0.0 sec/iter 0.00 maintenance 4.4
Average PSNR: 28.24
iter 1000 time 2m 30s sec/eval 117.5 sec/iter 0.12 maintenance 28.4
Average PSNR: 29.85
iter 2000 time 4m 33s sec/eval 114.2 sec/iter 0.11 maintenance 8.2
Average PSNR: 29.96
iter 3000 time 6m 39s sec/eval 115.8 sec/iter 0.12 maintenance 10.3
。。。
花費時間可能比較久,例如ImageNet作爲訓練集時,NVIDIA Titan V GPU花費了7.5小時。
訓練完成時,會生成一個network_final.pickle在results/*目錄下。
4.2 validate
假如訓練好的網絡在下面目錄下:
results/00001-autoencoder-1gpu-L-n2n
. Here's how to run a set of images through this network:
python config.py validate --dataset-dir=datasets/kodak --network-snapshot=results/00001-autoencoder-1gpu-L-n2n/network_final.