【tensorflow】多維張量做tf.matmul

首發地址：https://zhuanlan.zhihu.com/p/138731311
線性代數都學過二維矩陣的乘法，而tf.matmul還可以處理多維矩陣，比如

import tensorflow as tf
import numpy as np
a = tf.random.uniform([2, 1, 2, 3])
b = tf.random.uniform([1, 3, 3, 2])
c = tf.matmul(a, b)

c是什麼呢？

先給出結論：不管幾維矩陣都是先做最後兩維的矩陣的乘法，再在不同維度重複多次。

多維的 tf.matmul(a, b) 的維度有如下兩個要求：

1、a的axis=-1的值（_只可意會）和b的axis=-2的值需要相等。比如上述例子中[3, 2, 3]最後的3，和[3, 3, 2]的第二個3。

2、a和b的各維度的值（除了axis=-1和-2的值），在任意維度上，都需要“相等”或“有一個是1”。

比如，[3, 2, 3]維度的張量與[3, 3, 2]維度的張量做tf.matmul的例子：

In [84]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([3, 2, 3])
    ...: b = tf.random.uniform([3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:

Out[84]: TensorShape([3, 2, 2])

In [87]: tf.matmul(a[0],b[0])
Out[87]:
<tf.Tensor: id=374, shape=(2, 2), dtype=float32, numpy=
array([[1.4506222 , 1.323427  ],
       [0.28268352, 0.2917934 ]], dtype=float32)>

In [88]: tf.matmul(a[1],b[1])
Out[88]:
<tf.Tensor: id=383, shape=(2, 2), dtype=float32, numpy=
array([[1.0278544 , 0.4219831 ],
       [0.865297  , 0.87740964]], dtype=float32)>

In [89]: c
Out[89]:
<tf.Tensor: id=365, shape=(3, 2, 2), dtype=float32, numpy=
array([[[1.4506222 , 1.323427  ],
        [0.28268352, 0.2917934 ]],

       [[1.0278544 , 0.4219831 ],
        [0.865297  , 0.8774096 ]],

       [[0.5752927 , 0.13066964],
        [0.5343988 , 0.2741483 ]]], dtype=float32)>

可以看到，[3, 2, 3]維度的張量與[3, 3, 2]維度的張量做tf.matmul，可以理解成：

第一步，先在axis=1和2的維度上做[2, 3]維度的張量與[3, 2]維度的張量之間的二維張量的矩陣乘法，得到[2, 2]維度的結果；

第二部，然後在axis=0的維度上，分別選a的第i個和選b的第i個做上述的第一步，最終得到[3, 2，2]維度的輸出。

如果，a和b的axis=0維度對不上，會bug：

In [95]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([2, 2, 3])
    ...: b = tf.random.uniform([3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-95-462c4976a35a> in <module>
      3 a = tf.random.uniform([2, 2, 3])
      4 b = tf.random.uniform([3, 3, 2])
----> 5 c = tf.matmul(a, b)
      6 c.shape
      7

D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\util\dispatch.py in wrapper(*args, **kwargs)
    178     """Call target, and fall back on dispatchers if there is a TypeError."""
    179     try:
--> 180       return target(*args, **kwargs)
    181     except (TypeError, ValueError):
    182       # Note: convert_to_eager_tensor currently raises a ValueError, not a

D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\math_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name)
   2725         b = conj(b)
   2726         adjoint_b = True
-> 2727       return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
   2728
   2729     # Neither matmul nor sparse_matmul support adjoint, so we conjugate

D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py in batch_mat_mul_v2(x, y, adj_x, adj_y, name)
   1700       else:
   1701         message = e.message
-> 1702       _six.raise_from(_core._status_to_exception(e.code, message), None)
   1703   # Add nodes to the TensorFlow graph.
   1704   if adj_x is None:

D:\S\Anaconda3_v3\lib\site-packages\six.py in raise_from(value, from_value)

InvalidArgumentError: In[0] and In[1] must have compatible batch dimensions: [2,2,3] vs. [3,3,2] [Op:BatchMatMulV2] name: MatMul/

但是當a和b中axis=0的值有一個是1，不會bug：

In [90]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([1, 2, 3])
    ...: b = tf.random.uniform([3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
Out[90]: TensorShape([3, 2, 2])

In [91]: c
Out[91]:
<tf.Tensor: id=398, shape=(3, 2, 2), dtype=float32, numpy=
array([[[0.59542704, 0.60751694],
        [0.19115494, 0.36344892]],

       [[1.0542538 , 0.75257593],
        [0.26940605, 0.24408351]],

       [[1.1716111 , 0.4058628 ],
        [0.09086016, 0.28043625]]], dtype=float32)>

In [92]: tf.matmul(a[0],b[0])
Out[92]:
<tf.Tensor: id=407, shape=(2, 2), dtype=float32, numpy=
array([[0.59542704, 0.60751694],
       [0.19115494, 0.36344892]], dtype=float32)>

In [93]: tf.matmul(a[0],b[1])
Out[93]:
<tf.Tensor: id=416, shape=(2, 2), dtype=float32, numpy=
array([[1.0542538 , 0.7525759 ],
       [0.26940605, 0.2440835 ]], dtype=float32)>

In [94]: tf.matmul(a[0],b[2])
Out[94]:
<tf.Tensor: id=425, shape=(2, 2), dtype=float32, numpy=
array([[1.1716112 , 0.4058628 ],
       [0.09086016, 0.28043625]], dtype=float32)>

依然遵循上述的先最後兩維做乘法，再依次組成結果，只是由於a的axis=0的值爲1，所以是b在axis=0的所有都對應a的axis=0的唯一。（還是看代碼和輸出結果更清楚。）

所以得到三維上的結論：

先做最後兩維的矩陣的乘法，再在不同維度重複多次。

多維的 tf.matmul(a, b) 的維度有如下兩個要求：

1、a的axis=2的值（_只可意會）和b的axis=1的值需要相等。

2、a和b的axis=0的值需要“相等”或者“有一個是1”。

再看更高維度，比如四維的情況。

In [96]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([2, 1, 2, 3])
    ...: b = tf.random.uniform([2, 3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
Out[96]: TensorShape([2, 3, 2, 2])

In [97]: c
Out[97]:
<tf.Tensor: id=454, shape=(2, 3, 2, 2), dtype=float32, numpy=
array([[[[1.0685383 , 1.9015994 ],
         [1.1457413 , 1.5246255 ]],

        [[0.953201  , 1.5544493 ],
         [0.7639411 , 1.4360913 ]],

        [[0.67427766, 0.49847895],
         [0.499685  , 0.39281937]]],


       [[[0.42752475, 0.7453967 ],
         [0.3735991 , 0.74812794]],

        [[0.54442215, 0.6510606 ],
         [0.6632798 , 0.38497943]],

        [[0.3459217 , 0.96300673],
         [0.45035997, 0.90772474]]]], dtype=float32)>

In [98]: tf.matmul(a[0],b[0])
Out[98]:
<tf.Tensor: id=463, shape=(3, 2, 2), dtype=float32, numpy=
array([[[1.0685383 , 1.9015994 ],
        [1.1457413 , 1.5246255 ]],

       [[0.953201  , 1.5544493 ],
        [0.7639411 , 1.4360913 ]],

       [[0.67427766, 0.49847895],
        [0.499685  , 0.39281937]]], dtype=float32)>

In [99]: tf.matmul(a[1],b[1])
Out[99]:
<tf.Tensor: id=472, shape=(3, 2, 2), dtype=float32, numpy=
array([[[0.42752475, 0.7453967 ],
        [0.3735991 , 0.74812794]],

       [[0.54442215, 0.6510606 ],
        [0.6632798 , 0.38497943]],

       [[0.3459217 , 0.96300673],
        [0.45035997, 0.90772474]]], dtype=float32)>

和三維時候是一致的，層層都依次做tf.matmul，也都能轉化爲最後兩維的二維矩陣乘法。

同理，axis=0維度位置的值，有一個是1，也行：

In [100]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([1, 3, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
     ...:
     ...:
Out[100]: TensorShape([2, 3, 2, 2])

不再贅述

最終結論：不管幾維矩陣都是先做最後兩維的矩陣的乘法，再在不同維度重複多次。

多維的 tf.matmul(a, b) 的維度有如下兩個要求：

1、a的axis=-1的值（_只可意會）和b的axis=-2的值需要相等。

2、a和b的各維度的值（除了axis=-1和-2的值），在任意維度上，都需要“相等”或“有一個是1”。

另外給出一些維度數量對不上的例子，供意會：

In [105]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([1, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
Out[105]: TensorShape([2, 1, 2, 2])

In [106]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([7, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
Out[106]: TensorShape([2, 7, 2, 2])

In [107]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([7, 9, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-107-ff6e40117cf7> in <module>
      3 a = tf.random.uniform([2, 1, 2, 3])
      4 b = tf.random.uniform([7, 9, 3, 2])
----> 5 c = tf.matmul(a, b)
      6 c.shape

D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\util\dispatch.py in wrapper(*args, **kwargs)
    178     """Call target, and fall back on dispatchers if there is a TypeError."""
    179     try:
--> 180       return target(*args, **kwargs)
    181     except (TypeError, ValueError):
    182       # Note: convert_to_eager_tensor currently raises a ValueError, not a

D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\math_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name)
   2725         b = conj(b)
   2726         adjoint_b = True
-> 2727       return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
   2728
   2729     # Neither matmul nor sparse_matmul support adjoint, so we conjugate

D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py in batch_mat_mul_v2(x, y, adj_x, adj_y, name)
   1700       else:
   1701         message = e.message
-> 1702       _six.raise_from(_core._status_to_exception(e.code, message), None)
   1703   # Add nodes to the TensorFlow graph.
   1704   if adj_x is None:

D:\S\Anaconda3_v3\lib\site-packages\six.py in raise_from(value, from_value)

InvalidArgumentError: In[0] and In[1] must have compatible batch dimensions: [2,1,2,3] vs. [7,9,3,2] [Op:BatchMatMulV2] name: MatMul/

a和b的維度對不上也可以用，規則是“向右看齊”。

後面討論多維 tf.matmul(a, b, transpose_b=True) 的情況：

In [111]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([2, 1, 2, 3])
     ...: c = tf.matmul(a, b, transpose_b=True)
     ...: c.shape
Out[111]: TensorShape([2, 1, 2, 2])

In [112]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([1, 5, 2, 3])
     ...: c = tf.matmul(a, b, transpose_b=True)
     ...: c.shape
Out[112]: TensorShape([2, 5, 2, 2])

transpose只是對最後兩維做了轉置，用於二維矩陣乘法能對的上。

感覺有用請點贊~~~謝謝

【tensorflow】多維張量做tf.matmul

決策樹系列思路

xgboost基本過程、公式推導

知識圖譜問答的思路 -- 筆記1

[對應示例]λ-算子、λ-DCS、SPARQL、Cypher

【知識圖譜應用】實體鏈接的思路

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結