Keras入門筆記(番一)：從源碼分析K.batch_dot及與dot的區別

動機

矩陣和向量的乘法各種名稱都有，甚至相互混雜，在不同框架裏的命名也不一樣，每每都會陷入這些Magic中。例如，同樣是dot對向量shape= (n,)和一維張量shape=(n,1)而言都不一樣，無論總結過多少次，像我們這種torch和tensowflow、matlab輪着寫的人，總是不經意間就會翻車。

好在keras提供了高級的接口，至少在tensorflow、theano以及可能會有的mxnet上的表現是一致的。

各種向量乘法的命名

我個人非常煩什麼“點積、外積、內積、點乘、叉乘、直積、向量積、張量積”的說法，亂的不行。我覺得，還是應該統一一下，別一會兒點積一會兒點乘，二維一維都不區分，非常容易亂。由於中文教材各種翻譯都有，因此主要還是用wiki作爲統一吧。

一維(向量)

需要注意的是，shape=(n, )的纔是一維向量，shape(n,1)已經變成張量了。

Dot product

import numpy as np
a = np.array([1,2,3,4,5])  # 向量，不區分列向量或行向量。應該視爲列向量。
b = a.reshape((5,1))  # 張量
print(a.shape, b.shape, a.T.shape)  # (5,) (5, 1) (5,)
print((a+b).shape)  # (5, 5)
print(np.dot(a,a), a*a)  # 55 [1 4 9 16 25]
print(np.dot(b.T,b))  # [[55]]
# Also, a*a = np.multiply(a, a), b*b = np.multiply(b, b)

Cross product

構建神經網絡時基本不用，僅在工程優化中大量使用，如共軛梯度等。API一般爲cross(a, b)。

element-wise

逐元素乘法，也就是 Dot product 不進行求和： $c_i=a_ib_i$ 。API一般爲multiply(a, b)

二維(矩陣)

Hadamard product

常說的對應元素逐元素相乘。也是element-wise的一種。API一般是multiply(a, b)。

Matrix multiplication

就是線代中的矩陣乘法。一般也由dot(a, b)或matmul(a, b)實現。

三維以上(高維張量)

Tensor product

由於涉及到羣和多重線性代數，Tensor product不是太好表示，看wiki即可。

簡單區分dot與matmul

keras.dot 實際上並不進行實際的計算，只是在matmul上加了一層封裝，用於適配一維dot product和稀疏向量計算時的優化，對nD張量進行相乘的規則進行補充。直接讀源碼：

if ndim(x) is not None and (ndim(x) > 2 or ndim(y) > 2):
    # 對`nD`張量進行相乘的規則進行補充
    # 同時適配x y 維度不一致的情況
    # 即： x的最後一個維度與y的最後一個維度應當相同，這兩個維度的元素進行dot product
    # 例如 a.shape = (5, 6) b.shape=(8, 6, 3) => dot(a,b).shape=(5, 8, 3)
    # 其在xy的最後兩個維度上的表現，就如同二維Matrix multiplication一樣。
    x_shape = []
    for i, s in zip(int_shape(x), tf.unstack(tf.shape(x))):
        if i is not None:
            x_shape.append(i)
        else:
            x_shape.append(s)
    x_shape = tuple(x_shape)
    y_shape = []
    for i, s in zip(int_shape(y), tf.unstack(tf.shape(y))):
        if i is not None:
            y_shape.append(i)
        else:
            y_shape.append(s)
    y_shape = tuple(y_shape)
    y_permute_dim = list(range(ndim(y)))
    y_permute_dim = [y_permute_dim.pop(-2)] + y_permute_dim
    xt = tf.reshape(x, [-1, x_shape[-1]])
    yt = tf.reshape(tf.transpose(y, perm=y_permute_dim), [y_shape[-2], -1])
    return tf.reshape(tf.matmul(xt, yt),
                      x_shape[:-1] + y_shape[:-2] + y_shape[-1:])
# 在2維和低維情況下
if is_sparse(x):
    out = tf.sparse_tensor_dense_matmul(x, y)
else:
    out = tf.matmul(x, y)
return out

keras.batch_dot函數源碼分析

雖然這個函數中帶有一個dot，然而其和dot沒有太大關聯。其更多的是一種可自定義維度的element-wise算法，注重的是對深度學習中的維度規則進行了優化：往往第一個維度是批樣本的batch_size，非常適用於計算諸如 $\sum_i{ a_{ij}b_{j|i}}$ 的場景。

源碼分爲兩個部分，第一個部分：

    # axes 對應了x, y向量中分別準備進行dot product的維度
    if isinstance(axes, int):
        axes = (axes, axes)
    x_ndim = ndim(x)
    y_ndim = ndim(y)
    if axes is None:
        # behaves like tf.batch_matmul as default
        axes = [x_ndim - 1, y_ndim - 2]
    if py_any([isinstance(a, (list, tuple)) for a in axes]):
        raise ValueError('Multiple target dimensions are not supported. ' +
                         'Expected: None, int, (int, int), ' +
                         'Provided: ' + str(axes))
    #  將二者補齊維度，補爲1維
    if x_ndim > y_ndim:
        diff = x_ndim - y_ndim
        y = tf.reshape(y, tf.concat([tf.shape(y), [1] * (diff)], axis=0))
    elif y_ndim > x_ndim:
        diff = y_ndim - x_ndim
        x = tf.reshape(x, tf.concat([tf.shape(x), [1] * (diff)], axis=0))
    else:
        diff = 0

接着是第二部分，主要涉及了補充了計算的邏輯：

    if ndim(x) == 2 and ndim(y) == 2:
        # 如果都是二維矩陣，則效果等同於直接計算二者矩陣乘積的對角線上的值
        # (實際上是 x y 進行hadamard product，然後在相應維度axes[0]、axes[1]上進行求和)
        if axes[0] == axes[1]:
            out = tf.reduce_sum(tf.multiply(x, y), axes[0])
        else:
            out = tf.reduce_sum(tf.multiply(tf.transpose(x, [1, 0]), y), axes[1])
    else:
       # 不都是二維矩陣的話，進行矩陣計算
        if axes is not None:
            # 判斷是否要進行共軛和轉置
            # 需要注意的是它並不對axes[0]的值進行傳遞而只是檢測
            # 這是一個比較magic的詭點，所以axes[1, 1] 可能會和[1000, 1000]的結果是一樣的
            adj_x = None if axes[0] == ndim(x) - 1 else True
            adj_y = True if axes[1] == ndim(y) - 1 else None
        else:
            adj_x = None
            adj_y = None
        # 這個計算比較精髓，涉及到線代知識。總之其效果是，給定的軸hadamard product然後求和
        # 同維度情況下，對最後兩維進行矩陣乘法，axes不起作用
        out = tf.matmul(x, y, adjoint_a=adj_x, adjoint_b=adj_y)
    if diff:
        # 在不是同維矩陣的情況下，
        if x_ndim > y_ndim:
            idx = x_ndim + y_ndim - 3  # (x_ndim-1+y_ndim-1) -1 二者總維度的序-1
        else:
            idx = x_ndim - 1
        # x_ndim較大的情況下，多餘的維度全部擠壓，保證輸出維度只有x_dim+y_dim-2
        # 否則輸出維度爲x_ndim
        out = tf.squeeze(out, list(range(idx, idx + diff)))
    if ndim(out) == 1:
        # 擴充維度以保證輸出維度不爲1
        out = expand_dims(out, 1)
    return out

坑：magic分析

對於所提到的magic，舉一個例子：

a = K.ones([100, 10, 16, 5])
b = K.ones([100, 10, 16, 9])
with tf.Session() as sess:
    print(K.batch_dot(a, b, axes=[2, 2]).shape) # (100, 10, 5, 9)
    print(K.batch_dot(a, b, axes=[100, 1000]).shape) # (100, 10, 5, 9)
    # 分析以上結果
    #    1.axes[0]都非x最後一維，x共軛轉置；
    #    2.axes[1]都非y最後一維，y不共軛轉置。
    #    3.消掉最後一個相同的維度

另一段代碼：

a = K.ones([100, 10, 16, 5])
b = K.ones([100, 10, 9, 16])
with tf.Session() as sess:
    print(K.batch_dot(a, b, axes=[2, 3]).shape)
    # 分析以上結果
    #    1.axes[0]不是x最後一維，x共軛轉置；
    #    2.axes[1]是y最後一維，y共軛轉置。
    #    3.消掉分別標註的維度

Keras入門筆記(番一)：從源碼分析K.batch_dot及與dot的區別

動機

各種向量乘法的命名

一維(向量)

二維(矩陣)

三維以上(高維張量)

簡單區分dot與matmul

keras.batch_dot函數源碼分析

坑：magic分析

python gdal 安裝使用（Windows， python 3.6.8）

Spring 學習筆記①：IoC容器、Bean與注入

【面試】Redis的要點筆記和大綱

Spring 學習筆記③：JDBC與事務管理

Spring 學習筆記②：動態代理及面向切面編程

【面試筆記摘要】數據庫事務的四大特性和隔離級別

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結