tensorflow2.x之由dataset.map引發出的關於tf.py_function以及tf.numpy_function問題

前言：tensorflow是一個龐大的系統，裏面的函數很多，實現了很多常規的一些操作，但是始終沒有辦法涵蓋所有的操作，有時候我們需要定義一些自己的操作邏輯來實現制定的功能，發現沒那麼簡單，本文是在編寫tf.data.DataSet的時候出現的一個問題，做了一個集中化的總結，會涉及到以下概念：

EagerTensor和Tensor，tf.py_function以及tf.numpy_function，dataset.map等等。

一、問題描述

需要解決的問題，現在有三個文本文件，分別存在files文件夾中，名稱分別爲file1.txt、file2.txt、file3.txt，裏面的內容分別是如下：

file1.txt

1,2,3,4,5

file2.txt

11,22,33,44,55

file3.txt

111,222,333,444,555

每一個文件的標籤分別爲，1,2,3，現在假設已經經過獨熱編碼，則類別分別爲

[ [1,0,0],[0,1,0],[0,0,1] ]

先在我需要通過dataset標準pipeline來讀取這三個樣本，以便於放入神經網絡進行訓練，顯然，我需要對每一個文本文件進行讀取操作，需要使用到datase.map()函數，我的代碼如下：

X=["file1.txt","file2.txt","file3.txt"]
Y=[[1,0,0],[0,1,0],[0,0,1]]

# 構建dataset對象
dataset = tf.data.Dataset.from_tensor_slices((X,Y))  # 第一步：構造dataset對象 

# 對每一個dataset的元素，實際上就是一個example進行解析
dataset = dataset.map(read_file)

for features,label in dataset:
    print(features)
    print(label)
    print("===========================================================")

解析函數read_file如下

def read_file(filename,label):
    tf.print(type(filename))
    tf.print(type(label))
    
    # filename_ = filename.numpy()
    # label_ = label.numpy()
    
    filename = "./files/" + filename
    tf.print("/////////////////////////////////////////////////////////")
    
    f =  open(filename,mode="r")
    s =f.readline()
    x_ =s.split(',')
    result =[]
    for i in x_:
        result.append(int(i))
    
    return result,label

代碼看起來沒什麼問題，但是運行實際上顯示下面錯誤：

TypeError: expected str, bytes or os.PathLike object, not Tensor

錯誤的位置在於

f =  open(filename,mode="r")

意思非常簡單，就是說讀取文件的這個filename應該是一個str，或者是表示路徑的對象，而不應該是一個Tensor對象，

注意：這個問題足足困擾了我有2天之久，在google上面找了很久才找到解決方案，中文搜索幾乎沒合適的答案。

那怎麼辦呢？

看起來好像很簡單，他既然說了這個filename和label是一個Tensor，那就是我們只要讀取到這個Tensor裏面的值就可以了啊，不就得到字符串嘛，事實上的確如此，tensorflow2.x中告訴我們獲取tensor的值可以使用t.numpy()來獲取，但是當我們使用了這兩個方法的時候我們發現依然還是錯誤的，又顯示下面的錯誤：

filename_ = filename.numpy()
label_ = label.numpy()

AttributeError: 'Tensor' object has no attribute 'numpy'

Tensor怎麼會沒有numpy屬性呢？我們不都是通過t.numpy()來獲取tensor的值得嗎？這實際上引出了下面的一個問題。

二、區分tf.EagerTensor和tf.Tensor

2.1 簡單的例子

先看幾個簡單的例子：


In [59]: a = tf.constant([1,2,3,4,5])

In [60]: a
Out[60]: <tf.Tensor: shape=(5,), dtype=int32, numpy=array([1, 2, 3, 4, 5])>

In [61]: type(a)
Out[61]: tensorflow.python.framework.ops.EagerTensor

發現兩個問題：

（1）這裏的a的確是一個Tensor，而且它有屬性numpy，我們可以通過a.numpy()來獲取它的值

（2）它的類型本質上是一個 EagerTensor，

而上面的Tensor之所以沒有numpy屬性是因爲它是這個樣子的

<class 'tensorflow.python.framework.ops.Tensor'>   # 類型

tf.Tensor([102 102 102], shape=(3,), dtype=int64)  # Tensor

可見它是沒有numpy屬性的，所以會報錯，

所以，在tensorflow2.x中，凡是可以用numpy獲取值的都是指的是EagerTensor，雖然打印出來顯示依然是下面的這種形式：

 <tf.Tensor: ... ...>

而Tensor到底是什麼呢？它實際上是靜態圖中一種Tensor。雖然我們現在是使用的動態庫，但是依然是在後臺有一個構建graph的過程，Tensor的值並一定能夠及時得到，而是需要爲如數據之後才能得到，在tensorflow1.x 靜態圖中，我們需要採用以下方式來獲取Tensor的值：

with tf.Session() as sess:
    result = sess.run([t])  # 獲取Tensor t 的值
    print(result)
    
    # 或者是
    result = t.eval()

2.2 使用tensorflow2.x的注意事項

關於EagerTensor和Tensor使用的一些注意事項

（1）希望打印看看運算結果，使用tf.print(tensor)而非print(tensor.numpy())

使用tf.print(tensor)能夠無論在動態圖還是靜態圖下都能夠打印出張量的內容，而print(tensor.numpy())只能在動態圖下使用，而且只能夠對EagerTensor使用，以下是一個正確的示範：

（2）使用tf.device而非tensor.gpu()、tensor.cpu()

新版本中創建張量時會自動分配到優先級高的設備上，比如存在gpu時，直接會分配到gpu上：

# 需要GPU版本才能看出創建的張量會直接放置到gpu上!CPU版本不行
import tensorflow as tf
print(tf.test.is_gpu_available())
# True
r = tf.random.normal((3, 4))
print(r.device)
# '/job:localhost/replica:0/task:0/device:GPU:0'

對於新版本的設備指定，EagerTensor可以直接通過.cpu()、.gpu()方法直接將張量移動到對應的設備上，但是tf.Tensor並沒有，兼容兩者的方法是在tf.device創建的scope下操作。一個在gpu下創建張量並移動到cpu下進行sin操作的錯誤例子爲：

（3）不要遍歷張量，儘量使用向量化的操作

EagerTensor是可以被遍歷的，但是tf.Tensor不行，所以儘量不要對張量進行遍歷，多想一想應該怎麼進行向量化的操作，不光動靜態圖的兼容性都有，向量化之後的速度的提升也是非常大的。

2.3 分析與理解，

我們可以這樣理解，

EagerTensor是實時的，可以在任何時候獲取到它的值，即通過numpy獲取

Tensor是非實時的，它是靜態圖中的組件，只有當喂入數據、運算完成才能獲得該Tensor的值，

那爲什麼datastep.map(function)

給解析函數function傳遞進去的參數，即上面的read_file(filename,label)中的filename和label是Tensor呢？

因爲對一個數據集dataset.map，並沒有預先對每一組樣本先進行map中映射的函數運算，而僅僅是告訴dataset，你每一次拿出來的樣本時要先進行一遍function運算之後才使用的，所以function的調用是在每次迭代dataset的時候才調用的，但是預先的參數filename和label只是一個“坑”，迭代的時候採用數據將這個“坑”填起來，而在運算的時候，雖然將數據填進去了，但是filename和label依然還是一個Tensor而不是EagerTensor，所以纔會出現上面的問題。

注意：兩個問題：

（1）Tensor和EagerTensor沒有辦法直接轉化

（2）Tensor沒有辦法在python函數中直接使用，因爲我沒辦法在python函數中獲取到Tensor的值

三、tensorflow與python代碼交互的方式——tf.py_function

我們需要自己定義函數的實現，用python編寫的函數沒有辦法直接來與Tensor交互，那怎麼辦呢？

tensorflow2.x版本提供了函數tf.py_function來時實現自己定義的功能。

3.1 函數原型

tf.py_function(func, inp, Tout, name=None)

作用：包裝Python函數，讓Python底阿媽可以與tensorflow進行交互

參數：

func ：自己定義的python函數名稱

inp ：自己定義python函數的參數列表，寫成列表的形式，[tensor1,tensor2,tensor3] 列表的每一個元素是一個Tensor對象，

注意與定義的函數參數進行匹配

Tout：它與自定義的python函數的返回值相對應的，

當Tout是一個列表的時候，如 [ tf.string,tf,int64,tf.float] 表示自定義函數有三個返回值，即返回三個tensor，每一個tensor的元素的類型與之對應

當Tout只有一個值的時候，如tf.int64，表示自定義函數返回的是一個整型列表或整型tensor

當Tout沒有值的時候，表示自定義函數沒有返回值

3.2 上面所出現的問題的解決方案

（1）定義自己實現的python函數

# dataset.map函數沒有直接使用它，而是先用tf.py_function來包裝他
def read_file(filename,label):
    tf.print(type(filename))     # 包裝之後類型不再是Tensor，而是EagerTensor
    tf.print(type(label))
    
    filename_ = filename.numpy() # 因爲是EagerTensor，可以使用numpy獲取值，在tensorflow中，字符串以byte存儲，所以它的值是  b'xxxxx'  的形式
    label_ = label.numpy()
    
    new_filename = filename_.decode()  # 將byte解碼得到str
    new_filename = "./files/" + new_filename
    
    # 先在的new_filename就是純python字符串了，可以直接打開了
    f =  open(new_filename,mode="r")
    s =f.readline()
    x_ =s.split(',')
    result =[]
    for i in x_:
        result.append(int(i))
    
    return result,label  # 返回，result是一個列表list

（2）定義一個函數來使用tf.py_function來包裝自己定義的python函數

z 注意參數的匹配以及類型的匹配
def wrap_function(x,y):
    x, y = tf.py_function(read_file, inp=[x, y], Tout=[tf.int32, tf.int32])
    return x,y

當然我們也可以不用編寫包裝函數，直接使用lambda表達式一步到位，

如果不使用tf.py_function()來包裝這裏的讀取函數read_file，則read_file的兩個參數都是Tensor

而使用了tf.py_function()來包裝read_file函數之後，它的參數就變成了EagerTensor，

至於爲什麼是這樣子，我還不是很清楚，望有大神告知！

即如下：

dataset = dataset.map(lambda x, y: tf.py_function(read_file, inp=[x, y], Tout=[tf.int32, tf.int32]))

（3）編寫dataset的pipeline

X=["file1.txt","file2.txt","file3.txt"]
Y=[[1,0,0],[0,1,0],[0,0,1]]

dataset = tf.data.Dataset.from_tensor_slices((X,Y))  # 第一步：構造dataset對象 

dataset = dataset.map(wrap_function)

dataset=dataset.repeat(3)       # 重複三次                                   
dataset=dataset.batch(3)        # 每次3個樣本一個batch


for features,label in dataset:
    print(features)
    print(label)
    print("=================================================================")

運行結果如下：

tf.Tensor(
[[  1   2   3   4   5]
 [ 11  22  33  44  55]
 [111 222 333 444 555]], shape=(3, 5), dtype=int32)
tf.Tensor(
[[1 0 0]
 [0 1 0]
 [0 0 1]], shape=(3, 3), dtype=int32)
=======================================================================================================
tf.Tensor(
[[  1   2   3   4   5]
 [ 11  22  33  44  55]
 [111 222 333 444 555]], shape=(3, 5), dtype=int32)
tf.Tensor(
[[1 0 0]
 [0 1 0]
 [0 0 1]], shape=(3, 3), dtype=int32)
=======================================================================================================
tf.Tensor(
[[  1   2   3   4   5]
 [ 11  22  33  44  55]
 [111 222 333 444 555]], shape=(3, 5), dtype=int32)
tf.Tensor(
[[1 0 0]
 [0 1 0]
 [0 0 1]], shape=(3, 3), dtype=int32)
=======================================================================================================

可以發現，現在的結果完全吻合!

3.3 關於Tensor與EagerTensor的進一步說明

注意：EagerTensor是可以直接與python代碼進行交互的，也可以進行迭代便利操作，不支持與Python直接進行交互的實際上是Tensor，這需要格注意，如下所示的例子：

（1）EagerTensor與python函數的交互

def iterate_tensor(tensor):
    tf.print(type(tensor))  # EagerTensor
    (x1, x2, x3), (x4, x5, x6) = tensor
    return tf.stack([x2, x4, x6])


const = tf.constant(range(6), shape=(2, 3)) # EagerTensor
o = iterate_tensor(const)
print(o)
'''運行結果爲：
<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor([1 3 5], shape=(3,), dtype=int32)
'''

（2）Tensor與python函數的交互

使用tf.function來修飾函數，如下：

@tf.function
def iterate_tensor(tensor):
    tf.print(type(tensor))  # Tensor
    (x1, x2, x3), (x4, x5, x6) = tensor
    return tf.stack([x2, x4, x6])


const = tf.constant(range(6), shape=(2, 3)) # EagerTensor
o = iterate_tensor(const)
print(o)

因爲使用了tf.function來修飾Python函數，會將其編譯爲靜態圖的操作，此時的tensor變爲了Tensor，所以上面的代碼會出錯：

OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did not convert this function. Try decorating it directly with @tf.function.

由此可見tensor變成了Tensor，不允許對其進行迭代操作，會出現錯誤。

總結：一定要注意區分EagerTensor和tf.Tensor

在動態圖下創建的張量是EagerTensor（引用方式爲from tensorflow.python.framework.ops import EagerTensor），在靜態圖下創建的張量是tf.Tensor。EagerTensor和tf.Tensor雖然非常相似，但是不完全一樣，如果依賴於EagerTensor特有的一些方法，會導致轉換到靜態圖時tf.Tensor沒有這些方法而報錯

我們很多時候不知道一個tensor到底是EagerTensor還是Tensor呢？最簡單的方式就是使用

tf.print(type(tensor_name))

進行查看

四、補充——關於tf.py_function和tf.numpy_function

必須承認是的TensorFlow的存在的這麼多（len(dir(tf.raw_ops))個，大約1227個）的Op依然不足以完全覆蓋numpy所有的功能，因此在一些情況下找不到合適的Op（或者Op組合）表達運算邏輯時，能用上numpy的函數也是挺好的，因此可能會有人會想到先EagerTensor轉換成numpy然後用numpy運算完再轉換成Tensor，tf.function可不允許這麼做，還是老老實實用tf.numpy_function吧。（當然可以自己寫Op Kernel然後編譯使用，後續看看有沒有額外的時間做自定義Op的總結，目前還是把早年立的填2.0的總結的坑的flag搞定再說> <）

關於更多

tf.py_function

tf.numpy_function

的使用請參見後面的例子吧

tensorflow2.x之由dataset.map引發出的關於tf.py_function以及tf.numpy_function問題

一、問題描述

二、區分tf.EagerTensor和tf.Tensor

三、tensorflow與python代碼交互的方式——tf.py_function

四、補充——關於tf.py_function和tf.numpy_function

VS2017的動態鏈接庫(Dynamic Link Library)配置

圖像質量評估各項指標（二）——結構相似性SSIM

TensorFlow（2.x版本，1.x版本）以及pytorch版本中關於GPU的信息查看以及GPU的配置問題

Linux環境之下使用ＶS Code搭建Ｃ/C++開發環境

tensorflow2.x之由dataset.map引發出的關於tf.py_function以及tf.numpy_function問題

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結