Tensorflow中的masking和padding

原創

hustqb

2020-02-24 05:36

聲明：

需要讀者對tensorflow和深度學習有一定了解

tf.boolean_mask實現類似numpy數組的mask操作

Python的numpy array可以使用boolean類型的數組作爲索引，獲得numpy array中對應boolean值爲True的項。示例如下：

# numpy array中的boolean mask
import numpy as np

target_arr = np.arange(5)
print "numpy array before being masked:"
print target_arr

mask_arr = [True, False, True, False, False]
masked_arr = target_arr[mask_arr]
print "numpy array after being masked:"
print masked_arr

運行結果如下：

numpy array before being masked:
[0 1 2 3 4]
numpy array after being masked:
[0 2]

tf.boolean_maks對目標tensor實現同上述numpy array一樣的mask操作，該函數的參數也比較簡單，如下所示：

tf.boolean_mask(
    tensor,  # target tensor
    mask,  # mask tensor
    axis=None,
    name='boolean_mask'
)

下面，我們來嘗試一下tf.boolean_mask函數，示例如下：

import tensorflow as tf

# tensorflow中的boolean mask
target_tensor = tf.constant([[1, 2], [3, 4], [5, 6]])
mask_tensor = tf.constant([True, False, True])
masked_tensor = tf.boolean_mask(target_tensor, mask_tensor, axis=0)

sess = tf.InteractiveSession()

print masked_tensor.eval()

mask tensor中的第0和第2個元素是True，mask axis是第0維，也就是我們只選擇了target tensor的第0行和第1行。

[[1 2]
 [5 6]]

如果把mask tensor也換成2維的tensor會怎樣呢？

mask_tensor2 = tf.constant([[True, False], [False, False], [True, False]])
masked_tensor2 = tf.boolean_mask(target_tensor, mask_tensor, axis=0)

print masked_tensor2.eval()

[[1 2]
 [5 6]]

我們發現，結果不是[[1], [5]]。tf.boolean_mask不做元素維度的mask，tersorflow中有tf.ragged.boolean_mask實現元素維度的mask。

tf.ragged.boolean_mask

tf.ragged.boolean_mask(
    data,
    mask,
    name=None
)

tensorflow中的sparse向量和sparse mask

tensorflow中的sparse tensor由三部分組成，分別是indices、values、dense_shape。對於稀疏張量SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])，轉化成dense tensor的值爲：

[[1, 0, 0, 0]
 [0, 0, 2, 0]
 [0, 0, 0, 0]]

使用tf.sparse.mask可以對sparse tensor執行mask操作。

tf.sparse.mask(
    a,
    mask_indices,
    name=None
)

上文定義的sparse tensor有1和2兩個值，對應的indices爲[[0, 0], [1, 2]]，執行tf.sparsse.mask(a, [[1, 2]])後，稀疏向量轉化成dense的值爲：

[[1, 0, 0, 0]
 [0, 0, 0, 0]
 [0, 0, 0, 0]]

由於tf.sparse中的大多數函數都只在tensorflow2.0版本中有，所以沒有實例演示。

padded_batch

tf.Dataset中的padded_batch函數，根據輸入序列中的最大長度，自動的pad一個batch的序列。

padded_batch(
    batch_size,
    padded_shapes,
    padding_values=None,
    drop_remainder=False
)

這個函數與tf.Dataset中的batch函數對應，都是基於dataset構造batch，但是batch函數需要dataset中的所有樣本形狀相同，而padded_batch可以將不同形狀的樣本在構造batch時padding成一樣的形狀。

elements = [[1, 2], 
            [3, 4, 5], 
            [6, 7], 
            [8]] 
            
A = tf.data.Dataset.from_generator(lambda: iter(elements), tf.int32) 
B = A.padded_batch(2, padded_shapes=[None]) 
B_iter = B.make_one_shot_iterator()

print B_iter.get_next().eval()

[[1 2 0]
 [3 4 5]]

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Tensorflow中的masking和padding

tf.boolean_mask實現類似numpy數組的mask操作

tf.ragged.boolean_mask

tensorflow中的sparse向量和sparse mask

padded_batch

Python 潮流週刊#52：Python 處理 Excel 的資源

5分鐘瞭解受限玻爾茲曼機(RBM)

自然語言處理中的Word Embedding簡介

馬氏距離（Mahalanobis Distance）介紹與實例

瞭解一下Sklearn中的文本特徵提取

給模型熱身——深度學習中的warm up

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結