圖像仿射變換共有“旋轉”、“平移”、“錯切(shear)”、“縮放”、“翻轉”5種。本文結合keras-retinanet的實現進行分析。之所以採用keras-retinanet進行分析,是因爲該實現較爲典型,比較容易理解。
keras-retinanet官方地址:https://github.com/fizyr/keras-retinanet.git
以上五種仿射變換位於utils/transform.py中。仿射變換在代碼中被用於目標檢測任務的圖像倍增。(PS:其實只有平移、縮放、翻轉可以用於目標檢測任務,因爲旋轉與錯切後物體的boundingbox可能變大,我認爲這可能造成boundingbox迴歸任務不準確)
1.旋轉,相較於opencv實現的圖片旋轉,retinanet中自帶的圖片實現更爲簡單,更多的應該是從效率角度考慮。使用numpy實現可以同時處理多組圖片。但opencv的圖片旋轉更爲複雜,除了圍繞圖片中心旋轉外,還可以圍繞圖片任意一點旋轉,並調整縮放比例。
def rotation(angle):
""" Construct a homogeneous 2D rotation matrix.
Args
angle: the angle in radians
Returns
the rotation matrix as 3 by 3 numpy array
"""
return np.array([
[np.cos(angle), -np.sin(angle), 0],
[np.sin(angle), np.cos(angle), 0],
[0, 0, 1]
])
其實僅需要2*2矩陣既可以解決,使用3*3矩陣爲將旋轉矩陣表示爲齊次形式。
2.平移
def translation(translation):
""" Construct a homogeneous 2D translation matrix.
# Arguments
translation: the translation 2D vector
# Returns
the translation matrix as 3 by 3 numpy array
"""
return np.array([
[1, 0, translation[0]],
[0, 1, translation[1]],
[0, 0, 1]
])
3.錯切
def shear(angle):
""" Construct a homogeneous 2D shear matrix.
Args
angle: the shear angle in radians
Returns
the shear matrix as 3 by 3 numpy array
"""
return np.array([
[1, -np.sin(angle), 0],
[0, np.cos(angle), 0],
[0, 0, 1]
])
4.縮放
def scaling(factor):
""" Construct a homogeneous 2D scaling matrix.
Args
factor: a 2D vector for X and Y scaling
Returns
the zoom matrix as 3 by 3 numpy array
"""
return np.array([
[factor[0], 0, 0],
[0, factor[1], 0],
[0, 0, 1]
])
5.翻轉
翻轉同樣是用scaling實現的,直接與“+1/-1”相乘即可以實現翻轉。
def random_flip(flip_x_chance, flip_y_chance, prng=DEFAULT_PRNG):
""" Construct a transformation randomly containing X/Y flips (or not).
Args
flip_x_chance: The chance that the result will contain a flip along the X axis.
flip_y_chance: The chance that the result will contain a flip along the Y axis.
prng: The pseudo-random number generator to use.
Returns
a homogeneous 3 by 3 transformation matrix
"""
flip_x = prng.uniform(0, 1) < flip_x_chance
flip_y = prng.uniform(0, 1) < flip_y_chance
# 1 - 2 * bool gives 1 for False and -1 for True.
return scaling((1 - 2 * flip_x, 1 - 2 * flip_y))