對話系統-論文核心研究歷程

文章目錄

一、Tensorflow Seq-to-Seq API介紹和源碼分析

轉載原文：https://blog.csdn.net/liuchonge/article/details/78856692

tensorflow版本升級之後把之前的tf.nn.seq2seq的代碼遷移到了tf.contrib.legacy_seq2seq下面，其實這部分API估計以後也會被遺棄，因爲已經開發出了新的API放在tf.contrib.seq2seq下面，更加靈活，但是目前在網上找到的代碼和仿真實現基本上用的還是legacy_seq2seq下面的代碼，所以我們先來分析一下這部分的函數功能及源碼實現。本次我們會介紹下面幾個函數，這部分代碼的定義都可以在python/ops/seq2seq.py文件中找到。
首先看一下這個文件的組成，主要包含下面幾個函數：

可以看到按照調用關係和功能不同可以分成下面的結構：

model_with_buckets
- seq2seq函數
  - basic_rnn_seq2seq
    - rnn_decoder
  - tied_rnn_seq2seq
  - embedding_tied_rnn_seq2seq
  - embedding_rnn_seq2seq
    - embedding_rnn_decoder
  - embedding_attention_seq2seq
    - embedding_attention_decoder
    - attention_decoder
      - attention
  - one2many_rnn_seq2seq
- loss函數
  - sequence_loss_by_example
  - sequence_loss

根據函數調用關係，介紹如下函數：

1. model_with_buckets()函數

最高層函數：model_with_buckets()函數，定義：

def model_with_buckets(encoder_inputs,
                       decoder_inputs,
                       targets,
                       weights,
                       buckets,
                       seq2seq,
                       softmax_loss_function=None,
                       per_example_loss=False,
                       name=None):
  if len(encoder_inputs) < buckets[-1][0]:
    raise ValueError("Length of encoder_inputs (%d) must be at least that of la"
                     "st bucket (%d)." % (len(encoder_inputs), buckets[-1][0]))
  if len(targets) < buckets[-1][1]:
    raise ValueError("Length of targets (%d) must be at least that of last "
                     "bucket (%d)." % (len(targets), buckets[-1][1]))
  if len(weights) < buckets[-1][1]:
    raise ValueError("Length of weights (%d) must be at least that of last "
                     "bucket (%d)." % (len(weights), buckets[-1][1]))

  all_inputs = encoder_inputs + decoder_inputs + targets + weights
  #保存每個bucket對應的loss和output
  losses = []
  outputs = []
  with ops.name_scope(name, "model_with_buckets", all_inputs):
    #對每個bucket都要選擇數據進行構建模型
    for j, bucket in enumerate(buckets):
      #buckets之間的參數都要進行復用
      with variable_scope.variable_scope(
          variable_scope.get_variable_scope(), reuse=True if j > 0 else None):
        #調用seq2seq進行解碼得到輸出，這裏需要注意的是，encoder_inputs和decoder_inputs是定義好的placeholder，
        #都是長度爲序列最大長度的列表（也就是最大的那個buckets的長度），按上面的例子，這兩個placeholder分別是長度爲20和30的列表
        #在構建模型時，對於每個bucket，只取其對應的長度placeholder即可，如對於(5,10)這個bucket，就取前5/10個placeholder進行構建模型
        bucket_outputs, _ = seq2seq(encoder_inputs[:bucket[0]],
                                    decoder_inputs[:bucket[1]])
        outputs.append(bucket_outputs)
        #如果指定per_example_loss,則調用suquence_loss_by_example，losses添加的是一個batch_size大小的列表
        if per_example_loss:
          losses.append(
              sequence_loss_by_example(
                  outputs[-1],
                  targets[:bucket[1]],
                  weights[:bucket[1]],
                  softmax_loss_function=softmax_loss_function))
        #否則調用suquence_loss，對上面的結構進行求和，losses添加的是一個值
        else:
          losses.append(
              sequence_loss(
                  outputs[-1],
                  targets[:bucket[1]],
                  weights[:bucket[1]],
                  softmax_loss_function=softmax_loss_function))

  return outputs, losses

1.1 參數解析

encoder_inputs： encoder的輸入，一個tensor的列表。列表中每一項都是encoder時的一個詞。
decoder_inputs：decoder的輸入，一個tensor的列表。列表中每一項都是decoder時的一個詞。
targets：目標值，與decoder_input只相差一個< EOS >符號，int32型
weights：目標序列長度值的mask標誌，如果是padding則weight=0，否則weight=1
buckets：就是定義的bucket值，是一個列表：[(5，10), (10，20),(20，30)…]
seq2seq：定義好的seq2seq模型，可以使用後面介紹的embedding_attention_seq2seq，embedding_rnn_seq2seq，basic_rnn_seq2seq等
softmax_loss_function: 計算誤差的函數，(labels, logits)，默認爲sparse_softmax_cross_entropy_with_logits
per_example_loss: 如果爲真，則調用sequence_loss_by_example，返回一個列表，其每個元素就是一個樣本的loss值。如果爲假，則調用sequence_loss函數，對一個batch的樣本只返回一個求和的loss值，具體見後面的分析
name：Optional name for this operation, defaults to “model_with_buckets”.

1.2 函數內部實現

目的是爲了減少計算量和加快模型計算速度；
在此這部分代碼比較古老，有些地方還在使用static_rnn()這種函數，其實新版的tf中引入dynamic_rnn之後就不需要這麼做了。
析：其實思路很簡單，就是將輸入長度分成不同的間隔，這樣數據的在填充時只需要填充到相應的bucket長度即可，不需要都填充到最大長度。
例：buckets取[(5，10), (10，20),(20，30)…]（每個bucket的第一個數字表示source填充的長度，第二個數字表示target填充的長度。
eg：‘我愛你’–>‘I love you’，應該會被分配到第一個bucket中，然後‘我愛你’會被pad成長度爲5的序列，‘I love you’會被pad成長度爲10的序列。
其實就是每個bucket表示一個模型的參數配置，這樣對每個bucket都構造一個模型，然後訓練時取相應長度的序列進行，而這些模型將會共享參數。其實這一部分可以參考現在的dynamic_rnn來進行理解，dynamic_rnn是對每個batch的數據將其pad至本batch中長度最大的樣本，而bucket則是在數據預處理環節先對數據長度進行聚類操作。

2. embedding_attention_seq2seq()函數

tf.nn.seq2seq.embedding_attention_seq2seq
本函數會調用seq2seq函數進行解碼操作，從名字就可看出本函數實現了embedding和attention兩個功能，而attention則是使用了“Neural Machine Translation by Jointly Learning to Align and Translate”這篇論文裏的定義方法：

# T代表time_steps，時序長度
def embedding_attention_seq2seq(encoder_inputs,   # [T, batch_size]
                                decoder_inputs,   # [T, batch_size]
                                cell,
                                num_encoder_symbols,
                                num_decoder_symbols,
                                embedding_size,
                                num_heads=1,  #attention的抽頭數量
                                output_projection=None,  #decoder的投影矩陣
                                feed_previous=False,
                                dtype=None,
                                scope=None,
                                initial_state_attention=False):
  with variable_scope.variable_scope(
      scope or "embedding_attention_seq2seq", dtype=dtype) as scope:
    dtype = scope.dtype
    # Encoder.先將cell進行deepcopy，因爲seq2seq模型是兩個相同的模型，但是模型參數不共享，所以encoder和decoder需要使用倆個不同的RNN cell
    encoder_cell = copy.deepcopy(cell)
    #先將encoder輸入進行embedding操作，直接在RNNcell的基礎上添加一個EmbeddingWrapper即可
    encoder_cell = core_rnn_cell.EmbeddingWrapper(
        encoder_cell,
        embedding_classes=num_encoder_symbols,
        embedding_size=embedding_size)
    #這裏仍然使用static_rnn函數來構造RNN模型
    encoder_outputs, encoder_state = rnn.static_rnn(
        encoder_cell, encoder_inputs, dtype=dtype)

    # First calculate a concatenation of encoder outputs to put attention on.
    #將encoder的輸出由列表換成Tensor，shape爲[batch_size,encoder_input_length,output_size],
    #轉換之後的Tensor就可以作爲Attention的輸入了
    top_states = [
        array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs
    ]
    attention_states = array_ops.concat(top_states, 1)

    # Decoder.
    output_size = None
    #將decoder的輸出進行映射到output_vocab_size維度，直接將RNNcell添加上一個OutputProjectionWrapper包裝即可
    if output_projection is None:
      cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
      output_size = num_decoder_symbols
    #如果feed_previous是bool型的值，則直接調用embedding_attention_decoder函數進行解碼
    if isinstance(feed_previous, bool):
      return embedding_attention_decoder(
          decoder_inputs,
          encoder_state,
          attention_states,
          cell,
          num_decoder_symbols,
          embedding_size,
          num_heads=num_heads,
          output_size=output_size,
          output_projection=output_projection,
          feed_previous=feed_previous,
          initial_state_attention=initial_state_attention)
    
    # If feed_previous is a Tensor, we construct 2 graphs and use cond.
    # 如果feed_previous是一個tensor，則使用tf.cond構建兩個graph
    def decoder(feed_previous_bool):
      #本函數會被調用兩次，第一次不適用reuse，第二次使用reuse。所以decoder(True),decoder(false)
      reuse = None if feed_previous_bool else True
      with variable_scope.variable_scope(
          variable_scope.get_variable_scope(), reuse=reuse):
        outputs, state = embedding_attention_decoder(
            decoder_inputs,
            encoder_state,
            attention_states,
            cell,
            num_decoder_symbols,
            embedding_size,
            num_heads=num_heads,
            output_size=output_size,
            output_projection=output_projection,
            feed_previous=feed_previous_bool,
            update_embedding_for_previous=False,
            initial_state_attention=initial_state_attention)
        state_list = [state]
        if nest.is_sequence(state):
          state_list = nest.flatten(state)
        return outputs + state_list

    outputs_and_state = control_flow_ops.cond(feed_previous,
                                              lambda: decoder(True),
                                              lambda: decoder(False))
    outputs_len = len(decoder_inputs)  # Outputs length same as decoder inputs.
    state_list = outputs_and_state[outputs_len:]
    state = state_list[0]
    if nest.is_sequence(encoder_state):
      state = nest.pack_sequence_as(
          structure=encoder_state, flat_sequence=state_list)
    return outputs_and_state[:outputs_len], state

2.1 參數解析

encoder_inputs：encoder的輸入，int32型 id tensor list
decoder_inputs：decoder的輸入，int32型 id tensor list
cell：RNNCell常見的一些RNNCell定義都可以用.
num_encoder_symbols：source的vocab_size大小(詞表大小)，用於embedding矩陣定義
num_decoder_symbols：target的vocab_size大小(詞表大小)，用於embedding矩陣定義
embedding_size：embedding向量的維度
num_heads：Attention頭的個數，就是使用多少種attention的加權方式，用更多的參數來求出幾種attention向量
output_projection=None：輸出的映射層，想要得到num_decoder_symbols對應的詞還需要增加一個映射層，參數是W和B，W:[output_size, num_decoder_symbols],b:[num_decoder_symbols]。若output_projection爲默認的None時爲訓練模式，這時的cell加上了一層OutputProjectionWrapper，即將[batch_size, output_size]轉化爲[batch_size, symbol]。如果output_projection不爲空，則此時的cell輸出的爲[batch_size, output_size]。（兩個cell是不同的，這就直接影響到後續的embedding_rnn_decoder解碼過程和loop_function的定義操作）。
feed_previous：是否將上一時刻輸出作爲下一時刻輸入，一般測試的時候置爲True，此時只有第一個decoder的輸入（“GO"符號）有用，所有的decoder輸入都依賴於上一步的輸出。
initial_state_attention：默認爲False, 初始的attention是零；若爲True，將從initial state和attention states開始attention。

2.2 函數內部實現

上面的代碼進行了embedding的encoder階段，最終得到每個時間步的隱藏層向量表示encoder_outputs，然後將各個時間步的輸出進行reshape並concat變成一個[batch_size，encoder_input_length，output_size]的tensor。方便計算每個decode時刻的編碼向量Ci。
在decoder階段，先是對RNNCell封裝了一個OutputProjectionWrapper用於輸出層的映射（將輸出映射成想要的維度），然後直接調用embedding_attention_decoder函數解碼。但是當feed_previous不是bool型的變量，而是一個tensor的時候，會執行def decoder此函數.

2.3 output

(outputs, state) tuple pair：

outputs是 2D Tensors list, 每個Tensor的shape是[batch_size, cell.state_size]；
state是最後一個時間步，decoder cell的state，shape是[batch_size, cell.state_size]

3.embedding_attention_decoder函數

前面的embedding_attention_seq2seq在解碼時會直接調用本函數。
代碼定義：

def embedding_attention_decoder(decoder_inputs,
                                initial_state,
                                attention_states,
                                cell,
                                num_symbols,
                                embedding_size,
                                num_heads=1,
                                output_size=None,
                                output_projection=None,
                                feed_previous=False,
                                update_embedding_for_previous=True,
                                dtype=None,
                                scope=None,
                                initial_state_attention=False):
  if output_size is None:
    output_size = cell.output_size
  if output_projection is not None:
    proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype)
    proj_biases.get_shape().assert_is_compatible_with([num_symbols])

  with variable_scope.variable_scope(
      scope or "embedding_attention_decoder", dtype=dtype) as scope:
    #decoder階段的embedding
    embedding = variable_scope.get_variable("embedding",
    #將上一個cell輸出進行output_projection,然後embedding得到當前cell的輸入，盡在feed_previous情況下使用                                        [num_symbols, embedding_size])
    loop_function = _extract_argmax_and_embed(
        embedding, output_projection,
        update_embedding_for_previous) if feed_previous else None
    #如果不是feed_previous的話，將decoder_inputs進行embedding得到詞向量
    emb_inp = [
        embedding_ops.embedding_lookup(embedding, i) for i in decoder_inputs
    ]
    return attention_decoder(
        emb_inp,
        initial_state,
        attention_states,
        cell,
        output_size=output_size,
        num_heads=num_heads,
        loop_function=loop_function,
        initial_state_attention=initial_state_attention)

3.1 參數解析

decoder_inputs：這裏input是token id，shape爲a list of [batch_size, ]也就是說，輸入不需要自己做embedding，直接輸入tokens在vocab中對應的idx（即ids）即可，內部會自動幫我們進行id到embedding的轉化。
num_symbols：就是decoder階段的vocab_size
embedding_size：每個token需要embedding成的維數。
output_projection：如果output_projection爲默認的None，此時爲訓練模式，這時的cell加了一層OutputProjectionWrapper，即將輸出的[batch_size, output_size]轉化爲[batch_size,nums_symbol]。而如果output_projection不爲空，此時的cell的輸出還是[batch_size, output_size]。
update_embedding_for_previous：如果前一時刻的output不作爲當前的input的話(feed_previous=False)，這個參數沒影響（）；只有在feed_previous爲真的時候纔會起作用。就是在bp時只更新‘GO’的embedding向量，其他元素保持不變。
initial_state：2D Tensor [batch_size x cell.state_size]，RNN的初始狀態
attention_states：3D Tensor [batch_size x attn_length x attn_size]，就是上面計算出來的encoder階段的隱層向量

3.2 實現

第一步創建瞭解碼用的embedding；
第二步創建了一個循環函數loop_function，用於將上一步的輸出映射到詞表空間，輸出一個word embedding作爲下一步的輸入；

4. attention_decoder()函數

tf.nn.attention_decoder
論文涉及三個公式：
$u^{t}_{i} = v^{T} tanh(W^{'}_{1}h_{i}+W^{'}_{2}d_{t})$ $a^{t}_{i} = softmax(u^{t}_{i})$ $d^{'}_{t} = \sum_{i=1}^{T_{A}}a^{t}_{i}h_{i}$
encoder輸出的隱層狀態（ $h_{1}$ ，…， $h_{T_{A}}$ ）,decoder的隱層狀態（ $d_{1}$ ，…， $d_{T_{B}}$ ）。 $v^{T}$ ， $W^{'}_{1}$ ， $W^{'}_{2}$ 是模型要學的參數。所謂的attention，就是在每個解碼的時間步，對encoder的隱層狀態進行加權求和，針對不同信息進行不同程度的注意力。那麼我們的重點就是求出不同隱層狀態對應的權重。源碼中的attention機制裏是最常見的一種，可以分爲三步走：（1）通過當前隱層狀態( $d_{t}$ ）和關注的隱層狀態（ $h_{i}$ ）求出對應權重 $u^{t}_{i}$ ；（2）softmax歸一化爲概率；（3）作爲加權係數對不同隱層狀態求和，得到一個的信息向量 $d^{'}_{t}$ 。後續的 $d^{'}_{t}$ 使用會因爲具體任務有所差別。
上面的 $a^{t}_{i}$ 含義是第t個時間步，對 $h_{i}$ 的加權係數。

4.1 代碼

def attention_decoder(decoder_inputs,  # T*[batch_size, input_size]
                      initial_state,   # [batch_size, cell.states]
                      attention_states,# [batch_size, attn_length, attn_size]
                      cell,
                      output_size=None,
                      num_heads=1,
                      loop_function=None,
                      dtype=None,
                      scope=None,
                      initial_state_attention=False):
  if not decoder_inputs:
    raise ValueError("Must provide at least 1 input to attention decoder.")
  if num_heads < 1:
    raise ValueError("With less than 1 heads, use a non-attention decoder.")
  if attention_states.get_shape()[2].value is None:
    raise ValueError("Shape[2] of attention_states must be known: %s" %
                     attention_states.get_shape())
  if output_size is None:
    output_size = cell.output_size

  with variable_scope.variable_scope(
      scope or "attention_decoder", dtype=dtype) as scope:
    dtype = scope.dtype

    batch_size = array_ops.shape(decoder_inputs[0])[0]  # Needed for reshaping.
    attn_length = attention_states.get_shape()[1].value
    if attn_length is None:
      attn_length = array_ops.shape(attention_states)[1]
    attn_size = attention_states.get_shape()[2].value

    # To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before.
    # 爲了方便進行1*1卷積，將attention_states轉化爲[batch_size,num_steps,1,attention_size]的四維tensor
    #第四個維度是attention_size，表示的是
    hidden = array_ops.reshape(attention_states,
                               [-1, attn_length, 1, attn_size])
    #用來保存num_heads個讀取頭的相關信息，hidden_states保存的是w*hj，v保存的是v，每個讀取頭的參數是不一樣的
    hidden_features = []
    v = []
    #-----------------------------接下來計算v*tanh(w*hj+u*zi)來表示二者的相關性
    attention_vec_size = attn_size  # Size of query vectors for attention.
    #對隱藏層的每個元素計算w*hj
    for a in xrange(num_heads):
      #卷積核的size是1*1，輸入channle爲attn_size，共有attention_vec_size個filter
      k = variable_scope.get_variable("AttnW_%d" % a,
                                      [1, 1, attn_size, attention_vec_size])
      #卷積之後的結果就是[batch_size,num_steps,1,attention_vec_size]
      hidden_features.append(nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME"))
      v.append(
          variable_scope.get_variable("AttnV_%d" % a, [attention_vec_size]))

    state = initial_state

    #解碼RNN的隱層狀態
    def attention(query):
      """Put attention masks on hidden using hidden_features and query."""
      ds = []  # Results of attention reads will be stored here.
      #如果query是tuple，則將其flatten，並連接成二維的tensor
      if nest.is_sequence(query):  # If the query is a tuple, flatten it.
        query_list = nest.flatten(query)
        for q in query_list:  # Check that ndims == 2 if specified.
          ndims = q.get_shape().ndims
          if ndims:
            assert ndims == 2
        query = array_ops.concat(query_list, 1)
      for a in xrange(num_heads):
        with variable_scope.variable_scope("Attention_%d" % a):
          #計算u*zi，並將其reshape成[batch_size, 1, 1, attention_vec_size]  
          y = Linear(query, attention_vec_size, True)(query)
          y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size])
          # Attention mask is a softmax of v^T * tanh(...).
          #計算v * tanh(w * hj + u * zi)
          #hidden_features[a] + y的shape爲[batch_size, num_steps, 1，attention_vec_size],在於v向量(【attention_vec_size】)相乘仍保持不變
          #在2， 3兩個維度上進行reduce_sum操作,最終變成[batch_size，num_steps]的tensor，也就是各個hidden向量所對應的分數
          s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y),
                                  [2, 3])
          #使用softmax函數進行歸一化操作
          a = nn_ops.softmax(s)
          # Now calculate the attention-weighted vector d.
          #對所有向量進行加權求和
          d = math_ops.reduce_sum(
              array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden, [1, 2])
          ds.append(array_ops.reshape(d, [-1, attn_size]))
      return ds

    outputs = []
    prev = None
    batch_attn_size = array_ops.stack([batch_size, attn_size])
    attns = [
        array_ops.zeros(
            batch_attn_size, dtype=dtype) for _ in xrange(num_heads)
    ]
    for a in attns:  # Ensure the second shape of attention vectors is set.
      a.set_shape([None, attn_size])
    #如果使用全零初始化狀態，則直接調用attention並使用全另狀態。
    if initial_state_attention:
      attns = attention(initial_state)
    #如果不用全另初始化狀態，則對所有decoder_inputs進行遍歷，並逐個解碼
    for i, inp in enumerate(decoder_inputs):
      if i > 0:
        #如果i>0，則複用解碼RNN模型的參數
        variable_scope.get_variable_scope().reuse_variables()
      # If loop_function is set, we use it instead of decoder_inputs.
      #如果要使用前一時刻輸出作爲本時刻輸入，則調用loop_function覆蓋inp的值
      if loop_function is not None and prev is not None:
        with variable_scope.variable_scope("loop_function", reuse=True):
          inp = loop_function(prev, i)
      # Merge input and previous attentions into one vector of the right size.
      input_size = inp.get_shape().with_rank(2)[1]
      if input_size.value is None:
        raise ValueError("Could not infer input size from input: %s" % inp.name)
      #輸入是將inp與attns進行concat，餵給RNNcell
      inputs = [inp] + attns
      x = Linear(inputs, input_size, True)(inputs)
      # Run the RNN.
      cell_output, state = cell(x, state)
      # Run the attention mechanism.
      #計算下一時刻的atten向量
      if i == 0 and initial_state_attention:
        with variable_scope.variable_scope(
            variable_scope.get_variable_scope(), reuse=True):
          attns = attention(state)
      else:
        attns = attention(state)

      with variable_scope.variable_scope("AttnOutputProjection"):
        inputs = [cell_output] + attns
        output = Linear(inputs, output_size, True)(inputs)
      if loop_function is not None:
        prev = output
      outputs.append(output)

  return outputs, state

對於num_heads參數，我們知道，attention就是對信息的加權求和，一個attention head對應了一種加權求和方式，這個參數定義了用多少個attention head去加權求和，所以公式三可以進一步表述爲 $\sum^{num\_heads}_{j=1}\sum^{T_{A}}_{i=1}a_{i,j}h_{i}$

$W_{1}*h_{i}$ 用的是卷積的方式實現，返回的tensor的形狀是[batch_size, attn_length, 1, attention_vec_size]

    # To calculate W1 * h_t we use a 1-by-1 convolution
    hidden = array_ops.reshape(
        attention_states, [-1, attn_length, 1, attn_size])
    hidden_features = []
    v = []
    attention_vec_size = attn_size  # Size of query vectors for attention.
    for a in xrange(num_heads):
      k = variable_scope.get_variable("AttnW_%d" % a,
                                      [1, 1, attn_size, attention_vec_size])
      hidden_features.append(nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME"))
      v.append(
          variable_scope.get_variable("AttnV_%d" % a, [attention_vec_size]))

$W_{2}*d_{t}$ ，此項是通過下面的線性映射函數linear實現

for a in xrange(num_heads):
        with variable_scope.variable_scope("Attention_%d" % a):
          # query對應當前隱層狀態d_t
          y = linear(query, attention_vec_size, True)
          y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size])
          # 計算u_t
          s = math_ops.reduce_sum(
              v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])
          a = nn_ops.softmax(s)
          # 計算 attention-weighted vector d.
          d = math_ops.reduce_sum(
              array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden,
              [1, 2])
          ds.append(array_ops.reshape(d, [-1, attn_size]))

對話系統-論文核心研究歷程

對話系統-論文核心研究歷程

文章目錄

一、Tensorflow Seq-to-Seq API介紹和源碼分析

1. model_with_buckets()函數

1.1 參數解析

1.2 函數內部實現

2. embedding_attention_seq2seq()函數

2.1 參數解析

2.2 函數內部實現

2.3 output

3.embedding_attention_decoder函數

3.1 參數解析

3.2 實現

4. attention_decoder()函數

4.1 代碼

對話系統-論文核心研究歷程

Tensorflow中常用函數以及常見名詞定義講解

C++中疑問點

C++中類相關知識

tensorflow入門學習-程序記錄

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結