轉自：https://blog.csdn.net/weixin_42052460/article/details/80714539

tensorflow中協調器 tf.train.Coordinator 和入隊線程啓動器 tf.train.start_queue_runners

TensorFlow的Session對象是支持多線程的，可以在同一個會話（Session）中創建多個線程，並行執行。在Session中的所有線程都必須能被同步終止，異常必須能被正確捕獲並報告，會話終止的時候，隊列必須能被正確地關閉。

TensorFlow提供了兩個類來實現對Session中多線程的管理：tf.Coordinator和 tf.QueueRunner，這兩個類往往一起使用。

Coordinator類用來管理在Session中的多個線程，可以用來同時停止多個工作線程並且向那個在等待所有工作線程終止的程序報告異常，該線程捕獲到這個異常之後就會終止所有線程。使用 tf.train.Coordinator()來創建一個線程管理器（協調器）對象。

QueueRunner類用來啓動tensor的入隊線程，可以用來啓動多個工作線程同時將多個tensor（訓練數據）推送入文件名稱隊列中，具體執行函數是 tf.train.start_queue_runners ，只有調用 tf.train.start_queue_runners 之後，纔會真正把tensor推入內存序列中，供計算單元調用，否則會由於內存序列爲空，數據流圖會處於一直等待狀態。

tf中的數據讀取機制如下圖：

調用 tf.train.slice_input_producer，從本地文件裏抽取tensor，準備放入Filename Queue（文件名隊列）中;
調用 tf.train.batch，從文件名隊列中提取tensor，使用單個或多個線程，準備放入文件隊列;
調用 tf.train.Coordinator() 來創建一個線程協調器，用來管理之後在Session中啓動的所有線程;
調用tf.train.start_queue_runners, 啓動入隊線程，由多個或單個線程，按照設定規則，把文件讀入Filename Queue中。函數返回線程ID的列表，一般情況下，系統有多少個核，就會啓動多少個入隊線程（入隊具體使用多少個線程在tf.train.batch中定義）;
文件從 Filename Queue中讀入內存隊列的操作不用手動執行，由tf自動完成;
調用sess.run 來啓動數據出列和執行計算;
使用 coord.should_stop()來查詢是否應該終止所有線程，當文件隊列（queue）中的所有文件都已經讀取出列的時候，會拋出一個 OutofRangeError 的異常，這時候就應該停止Sesson中的所有線程了;
使用coord.request_stop()來發出終止所有線程的命令，使用coord.join(threads)把線程加入主線程，等待threads結束。

以上對列（Queue）和協調器（Coordinator）操作示例：

[python] view plain copy

# -*- coding:utf-8 -*-
import tensorflow as tf
import numpy as np
# 樣本個數
sample_num=5
# 設置迭代次數
epoch_num = 2
# 設置一個批次中包含樣本個數
batch_size = 3
# 計算每一輪epoch中含有的batch個數
batch_total = int(sample_num/batch_size)+1
# 生成4個數據和標籤
def generate_data(sample_num=sample_num):
labels = np.asarray(range(0, sample_num))
images = np.random.random([sample_num, 224, 224, 3])
print('image size {},label size :{}'.format(images.shape, labels.shape))
return images,labels
def get_batch_data(batch_size=batch_size):
images, label = generate_data()
# 數據類型轉換爲tf.float32
images = tf.cast(images, tf.float32)
label = tf.cast(label, tf.int32)
#從tensor列表中按順序或隨機抽取一個tensor準備放入文件名稱隊列
input_queue = tf.train.slice_input_producer([images, label], num_epochs=epoch_num, shuffle=False)
#從文件名稱隊列中讀取文件準備放入文件隊列
image_batch, label_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=2, capacity=64, allow_smaller_final_batch=False)
return image_batch, label_batch
image_batch, label_batch = get_batch_data(batch_size=batch_size)
with tf.Session() as sess:
# 先執行初始化工作
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
# 開啓一個協調器
coord = tf.train.Coordinator()
# 使用start_queue_runners 啓動隊列填充
threads = tf.train.start_queue_runners(sess, coord)
try:
while not coord.should_stop():
print '************'
# 獲取每一個batch中batch_size個樣本和標籤
image_batch_v, label_batch_v = sess.run([image_batch, label_batch])
print(image_batch_v.shape, label_batch_v)
except tf.errors.OutOfRangeError: #如果讀取到文件隊列末尾會拋出此異常
print("done! now lets kill all the threads……")
finally:
# 協調器coord發出所有線程終止信號
coord.request_stop()
print('all threads are asked to stop!')
coord.join(threads) #把開啓的線程加入主線程，等待threads結束
print('all threads are stopped!')

# -*- coding:utf-8 -*-
import tensorflow as tf
import numpy as np
# 樣本個數
sample_num=5
# 設置迭代次數
epoch_num = 2
# 設置一個批次中包含樣本個數
batch_size = 3
# 計算每一輪epoch中含有的batch個數
batch_total = int(sample_num/batch_size)+1
# 生成4個數據和標籤
def generate_data(sample_num=sample_num):
labels = np.asarray(range(0, sample_num))
images = np.random.random([sample_num, 224, 224, 3])
print('image size {},label size :{}'.format(images.shape, labels.shape))
return images,labels
def get_batch_data(batch_size=batch_size):
images, label = generate_data()
# 數據類型轉換爲tf.float32
images = tf.cast(images, tf.float32)
label = tf.cast(label, tf.int32)
#從tensor列表中按順序或隨機抽取一個tensor準備放入文件名稱隊列
input_queue = tf.train.slice_input_producer([images, label], num_epochs=epoch_num, shuffle=False)
#從文件名稱隊列中讀取文件準備放入文件隊列
image_batch, label_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=2, capacity=64, allow_smaller_final_batch=False)
return image_batch, label_batch
image_batch, label_batch = get_batch_data(batch_size=batch_size)
with tf.Session() as sess:
# 先執行初始化工作
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
# 開啓一個協調器
coord = tf.train.Coordinator()
# 使用start_queue_runners 啓動隊列填充
threads = tf.train.start_queue_runners(sess, coord)
try:
while not coord.should_stop():
print '************'
# 獲取每一個batch中batch_size個樣本和標籤
image_batch_v, label_batch_v = sess.run([image_batch, label_batch])
print(image_batch_v.shape, label_batch_v)
except tf.errors.OutOfRangeError: #如果讀取到文件隊列末尾會拋出此異常
print("done! now lets kill all the threads……")
finally:
# 協調器coord發出所有線程終止信號
coord.request_stop()
print('all threads are asked to stop!')
coord.join(threads) #把開啓的線程加入主線程，等待threads結束
print('all threads are stopped!')

輸出：

[python] view plain copy

************
((3, 224, 224, 3), array([0, 1, 2], dtype=int32))
************
((3, 224, 224, 3), array([3, 4, 0], dtype=int32))
************
((3, 224, 224, 3), array([1, 2, 3], dtype=int32))
************
done! now lets kill all the threads……
all threads are asked to stop!
all threads are stopped!

************
((3, 224, 224, 3), array([0, 1, 2], dtype=int32))
************
((3, 224, 224, 3), array([3, 4, 0], dtype=int32))
************
((3, 224, 224, 3), array([1, 2, 3], dtype=int32))
************
done! now lets kill all the threads……
all threads are asked to stop!
all threads are stopped!

以上程序在 tf.train.slice_input_producer 函數中設置了 num_epochs 的數量，所以在文件隊列末尾有結束標誌，讀到這個結束標誌的時候拋出 OutofRangeError 異常，就可以結束各個線程了。

如果不設置 num_epochs 的數量，則文件隊列是無限循環的，沒有結束標誌，程序會一直執行下去。

tf.train.Coordinator

tensorflow中協調器 tf.train.Coordinator 和入隊線程啓動器 tf.train.start_queue_runners

Ubuntu 下 Pytorch, Tensorflow 對應的Python、英偉達顯卡驅動、CUDA、CUDNN版本與環境信息查看方法

Linux 軟鏈接的增、刪、改、查

Keras 在fit-generator中獲取驗證數據的y_true和y_preds

keras 預訓練模型的使用方法

解決 Windows 10 家庭版無法使用NFS服務的問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結