基於文本的深度學習方法的TensorFlow實現(2)——使用RNN進行文本分類

原創

2020-06-22 19:36

使用RNN進行文本分類

開始輸入管道

encoder = info.features['text'].encoder

該文本編碼器以可逆的方式編碼字符串，解碼時返回其字節編碼。

sample_string = 'Hello TensorFlow.'
encoded_string = encoder.encode(sample_string)
print ('Encoded string is {}'.format(encoded_string))
original_string = encoder.decode(encoded_string)
print ('The original string: "{}"'.format(original_string))

準備訓練數據

首先，創建編碼字符串的批處理。

BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE, train_dataset.output_shapes)
test_dataset = test_dataset.padded_batch(BATCH_SIZE, test_dataset.output_shapes)

創建模型

首先，建立順序模型，從嵌入層開始。
tf.keras.layers.Bidirectional()與RNN一起使用，通過向前和向後傳播輸入，然後連接輸出。
RNN通過迭代元素處理序列輸入。

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(encoder.vocab_size, 64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

編譯

model.compile(loss='binary_crossentropy',
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

訓練模型

history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset, 
                    validation_steps=30)
test_loss, test_acc = model.evaluate(test_dataset)

在填充序列上訓練，在未填充序列上訓練，可以導致偏斜。使用掩碼避免這種情況。

def pad_to_size(vec, size):
  zeros = [0] * (size - len(vec))
  vec.extend(zeros)
  return vec

def sample_predict(sentence, pad):
  encoded_sample_pred_text = encoder.encode(sample_pred_text)

  if pad:
    encoded_sample_pred_text = pad_to_size(encoded_sample_pred_text, 64)
  encoded_sample_pred_text = tf.cast(encoded_sample_pred_text, tf.float32)
  predictions = model.predict(tf.expand_dims(encoded_sample_pred_text, 0))

  return (predictions)

# predict on a sample text without padding.
sample_pred_text = ('The movie was cool. The animation and the graphics '
                    'were out of this world. I would recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=False)
print (predictions)

# predict on a sample text with padding
sample_pred_text = ('The movie was cool. The animation and the graphics '
                    'were out of this world. I would recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=True)
print (predictions)

堆疊兩個或更多的LSTM層

Keras遞歸層有兩種模式，通過return_sequences構造參數控制。

對於每個時間步長返回連續輸出的完整序列
(batch_size, timesteps, output_features)
僅返回每個輸入序列的最後一個輸出
(batch_size, output_features)

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(encoder.vocab_size, 64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,  return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])
history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset,
                    validation_steps=30)
test_loss, test_acc = model.evaluate(test_dataset)

# predict on a sample text with padding
sample_pred_text = ('The movie was not good. The animation and the graphics '
                    'were terrible. I would not recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=True)
print (predictions)

參考鏈接

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

基於文本的深度學習方法的TensorFlow實現(2)——使用RNN進行文本分類

使用RNN進行文本分類

開始輸入管道

準備訓練數據

創建模型

訓練模型

堆疊兩個或更多的LSTM層

ollama使用

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

TiDB Vector 太香啦：以圖搜圖初體驗！

《最新出爐》系列入門篇-Python+Playwright自動化測試-41-錄製視頻

BFS廣度優先搜索（模板）

[轉載]學生如何提高專業英文閱讀能力

基於文本的深度學習方法的TensorFlow實現(1)——詞嵌入

2019_ITM_Continuous Gesture Segmentation and Recognition using 3DCNN and Convolutional LSTM

基於文本的深度學習方法的TensorFlow實現(2)——使用RNN進行文本分類

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結