簡介
本文主要是對base_layer.py
代碼的分析,該文件包含了最重要的Layer
類代碼,keras所有的層都是Layer
的父類,class Sequential(Model)
繼承了keras/engine/training.py
中的Model
類,而Model
類則繼承了同目錄下的keras/engine/topology.py
中的Container
類,Container
類繼承了同文件中的Layer
類,所以說Layer
類就是keras的地基,承載着整個框架。
預備知識
在開始講解源碼之前,需要給大家再講一些預備知識,許多做ai的同學對python的瞭解可能沒有那麼深入,如果對這些預備知識不理解的話可能很難理解源碼
裝飾器
裝飾器是用來修改函數功能的,它能讓代碼變得更加簡潔,在Layer
層中,採用了@property
和一個自定義的裝飾器,@property
類似Java中類的變量的get
和set
方法,例如下面的源碼
get方法,添加@property
即可
@property
def built(self):
return self._built
set方法,添加@屬性名.setter
@built.setter
def built(self, value):
self._built = value
另一個自定義的裝飾器是@interfaces.legacy_add_weight_support
,這個主要是keras2的代碼對於1的兼容,這裏就不做過多的講解了。
magic函數
python類中有一個函數__call__
叫做magic函數,實現了該方法的類的實例對象,可以直接以實例名作爲方法進行調用,這也是爲什麼我們能把每一層直接連接起來
inputs = Input(shape=(100))
x = Dense(64)(inputs)
源碼分析
接下來我們正式來看Layer
的源碼,其中截取了我認爲比較重要的內容,更多細節各位直接去看源碼吧,先從構造方法開始
def __init__(self, **kwargs):
self.input_spec = None
self.supports_masking = False
self.stateful = False
# These properties will be set upon call of self.build()
self._trainable_weights = []
self._non_trainable_weights = []
self._losses = []
self._updates = []
self._per_input_losses = {}
self._per_input_updates = {}
self._built = False
# These lists will be filled via successive calls
# to self._add_inbound_node().
self._inbound_nodes = []
self._outbound_nodes = []
# These properties should be set by the user via keyword arguments.
# note that 'dtype', 'input_shape' and 'batch_input_shape'
# are only applicable to input layers: do not pass these keywords
# to non-input layers.
allowed_kwargs = {'input_shape',
'batch_input_shape',
'batch_size',
'dtype',
'name',
'trainable',
'weights',
'input_dtype', # legacy
}
for kwarg in kwargs:
if kwarg not in allowed_kwargs:
raise TypeError('Keyword argument not understood:', kwarg)
name = kwargs.get('name')
if not name:
prefix = self.__class__.__name__
name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))
self.name = name
self.trainable = kwargs.get('trainable', True)
if 'input_shape' in kwargs or 'batch_input_shape' in kwargs:
# In this case we will later create an input layer
# to insert before the current layer
if 'batch_input_shape' in kwargs:
batch_input_shape = tuple(kwargs['batch_input_shape'])
elif 'input_shape' in kwargs:
batch_size = kwargs.get('batch_size')
batch_input_shape = (
batch_size,) + tuple(kwargs['input_shape'])
self.batch_input_shape = batch_input_shape
# Set dtype.
dtype = kwargs.get('dtype')
if dtype is None:
dtype = kwargs.get('input_dtype')
if dtype is None:
dtype = K.floatx()
self.dtype = dtype
self._initial_weights = kwargs.get('weights')
構造函數主要是參數的初始化和一些變量的賦值,其輸入參數在allowed_kwargs
中,包括
- input_shape,輸入維度
- batch_input_shape,包括batch_size的輸入維度
- batch_size,batch_size
- dtype,數據類型
- name,名字
- trainable,是否訓練
- weights,權重
- input_dtype,輸入類型
其中dtype
, input_shape
和 batch_input_shape
是輸入層才需要輸入的參數,其它時候不要傳。
然後我們來看add_weight
方法,該方法會給當前層添加需要訓練的權重。
def add_weight(self,
name,
shape,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
constraint=None):
initializer = initializers.get(initializer)
if dtype is None:
dtype = self.dtype
weight = K.variable(initializer(shape, dtype=dtype),
dtype=dtype,
name=name,
constraint=constraint)
if regularizer is not None:
with K.name_scope('weight_regularizer'):
self.add_loss(regularizer(weight))
if trainable:
self._trainable_weights.append(weight)
else:
self._non_trainable_weights.append(weight)
return weight
其中K.variable()
方法其實就是調用的tf的tf.Variable()
方法,並判斷是否有正則項,調用add_loss
方法把當前層的loss保存下來,然後根據trainable
參數判斷是否是需要訓練的權重,並分別添加到需要訓練和不需要訓練的兩個列表中。
接下來我們來看最重要的magic函數__call__
,這個函數比較長,我們分幾段來講
def __call__(self, inputs, **kwargs):
if isinstance(inputs, list):
inputs = inputs[:]
with K.name_scope(self.name):
# Handle laying building (weight creating, input spec locking).
if not self.built:
# Raise exceptions in case the input is not compatible
# with the input_spec specified in the layer constructor.
self.assert_input_compatibility(inputs)
# Collect input shapes to build layer.
input_shapes = []
for x_elem in to_list(inputs):
if hasattr(x_elem, '_keras_shape'):
input_shapes.append(x_elem._keras_shape)
elif hasattr(K, 'int_shape'):
input_shapes.append(K.int_shape(x_elem))
else:
raise ValueError('You tried to call layer "' +
self.name +
'". This layer has no information'
' about its expected input shape, '
'and thus cannot be built. '
'You can build it manually via: '
'`layer.build(batch_input_shape)`')
self.build(unpack_singleton(input_shapes))
self.built = True
# Load weights that were specified at layer instantiation.
if self._initial_weights is not None:
self.set_weights(self._initial_weights)
首先,根據構造方法中的name
構造了一個scope,然後對built
參數進行了判斷,如果是false,表示還沒有執行過build
方法,那麼先判斷是否是可用的輸入值,然後取得輸入數據的shape並執行build
方法,build
方法是對網絡結構的構造,執行完build
方法後會把built
置爲true,並判斷一下權重是否需要初始化。如果我們要複用一個layer,此時的built
參數是true,就不需要多次執行build
方法了,這也是爲啥要構造scope的原因。
# Raise exceptions in case the input is not compatible
# with the input_spec set at build time.
self.assert_input_compatibility(inputs)
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = kwargs.copy()
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
# Handle automatic shape inference (only useful for Theano).
input_shape = _collect_input_shape(inputs)
# Actually call the layer,
# collecting output(s), mask(s), and shape(s).
output = self.call(inputs, **kwargs)
output_mask = self.compute_mask(inputs, previous_mask)
# If the layer returns tensors from its inputs, unmodified,
# we copy them to avoid loss of tensor metadata.
output_ls = to_list(output)
inputs_ls = to_list(inputs)
output_ls_copy = []
for x in output_ls:
if x in inputs_ls:
x = K.identity(x)
output_ls_copy.append(x)
output = unpack_singleton(output_ls_copy)
這部分的代碼主要的內容是把輸入經過網絡後得到輸出結果,首先還是先判斷下是否是可用的輸入值,然後從輸入值從判斷下是否有mask,mask在後續的文章中會講解,如果有就取出來,然後調用call
方法,得到輸出值,並更新mask,之後對輸入輸出值做一個比較,如果輸入值與輸出值是一樣的,則直接返回輸入值,防止丟失一些元數據,其中call
方法就是網絡中tensor的一個變化過程。
# Inferring the output shape is only relevant for Theano.
if all([s is not None
for s in to_list(input_shape)]):
output_shape = self.compute_output_shape(input_shape)
else:
if isinstance(input_shape, list):
output_shape = [None for _ in input_shape]
else:
output_shape = None
if (not isinstance(output_mask, (list, tuple)) and
len(output_ls) > 1):
# Augment the mask to match the length of the output.
output_mask = [output_mask] * len(output_ls)
# Add an inbound node to the layer, so that it keeps track
# of the call and of all new variables created during the call.
# This also updates the layer history of the output tensor(s).
# If the input tensor(s) had not previous Keras history,
# this does nothing.
self._add_inbound_node(input_tensors=inputs,
output_tensors=output,
input_masks=previous_mask,
output_masks=output_mask,
input_shapes=input_shape,
output_shapes=output_shape,
arguments=user_kwargs)
# Apply activity regularizer if any:
if (hasattr(self, 'activity_regularizer') and
self.activity_regularizer is not None):
with K.name_scope('activity_regularizer'):
regularization_losses = [
self.activity_regularizer(x)
for x in to_list(output)]
self.add_loss(regularization_losses,
inputs=to_list(inputs))
return output
最後這段代碼就是一些善後工作了,首先根據輸入shape得到輸出shape,也正是在此處調用了compute_output_shape
方法,然後判斷下mask和輸出值維度是否相同,如果不同則把mask的維度進行一個擴展。然後調用了_add_inbound_node
方法,該方法會創建一個node(node的作用下文會講),把不同的層連接起來,這樣當前的層才能拿到之前層的一些輸出值和mask。最後判斷下是否需要activity_regularizer
,並添加到loss中去。kera中包括三種正則化,這裏簡單提一下
- kernel_regularizer:對該層中的權值進行正則化,使其不至於過大。
- bias_regularizer:與權值類似,限制該層中 biases 的大小。
- activity_regularizer:對該層的輸出進行正則化。
除了Layer
類外,這個py文件中還有兩個類,一個是InputSpec
另一個是Node
,我們再一起看下這兩個類是幹啥的。
class InputSpec(object):
def __init__(self, dtype=None,
shape=None,
ndim=None,
max_ndim=None,
min_ndim=None,
axes=None):
self.dtype = dtype
self.shape = shape
if shape is not None:
self.ndim = len(shape)
else:
self.ndim = ndim
self.max_ndim = max_ndim
self.min_ndim = min_ndim
self.axes = axes or {}
def __repr__(self):
spec = [('dtype=' + str(self.dtype)) if self.dtype else '',
('shape=' + str(self.shape)) if self.shape else '',
('ndim=' + str(self.ndim)) if self.ndim else '',
('max_ndim=' + str(self.max_ndim)) if self.max_ndim else '',
('min_ndim=' + str(self.min_ndim)) if self.min_ndim else '',
('axes=' + str(self.axes)) if self.axes else '']
return 'InputSpec(%s)' % ', '.join(x for x in spec if x)
這個類主要是給layer來指定ndim、dtype、shape等參數的,簡單的賦值操作,沒有過多複雜的語句。
Node
類是用來把兩個layer連接在一起的,好比一個數據通道,傳遞不同的layer之間需要用到的數據,爲什麼要有這個Node呢,我的理解是爲了解耦,Layer只關注其本身數據的處理,不關注數據的傳遞。回顧下Layer
類,其中有兩個參數,self._inbound_nodes
與self._outbound_nodes
,每次實例化Node
的時候,就會向這兩個參數中添加Node對象。
class Node(object):
def __init__(self, outbound_layer,
inbound_layers, node_indices, tensor_indices,
input_tensors, output_tensors,
input_masks, output_masks,
input_shapes, output_shapes,
arguments=None):
# Layer instance (NOT a list).
# this is the layer that takes a list of input tensors
# and turns them into a list of output tensors.
# the current node will be added to
# the inbound_nodes of outbound_layer.
self.outbound_layer = outbound_layer
# The following 3 properties describe where
# the input tensors come from: which layers,
# and for each layer, which node and which
# tensor output of each node.
# List of layer instances.
self.inbound_layers = inbound_layers
# List of integers, 1:1 mapping with inbound_layers.
self.node_indices = node_indices
# List of integers, 1:1 mapping with inbound_layers.
self.tensor_indices = tensor_indices
# Following 2 properties:
# tensor inputs and outputs of outbound_layer.
# List of tensors. 1:1 mapping with inbound_layers.
self.input_tensors = input_tensors
# List of tensors, created by outbound_layer.call().
self.output_tensors = output_tensors
# Following 2 properties: input and output masks.
# List of tensors, 1:1 mapping with input_tensor.
self.input_masks = input_masks
# List of tensors, created by outbound_layer.compute_mask().
self.output_masks = output_masks
# Following 2 properties: input and output shapes.
# List of shape tuples, shapes of input_tensors.
self.input_shapes = input_shapes
# List of shape tuples, shapes of output_tensors.
self.output_shapes = output_shapes
# Optional keyword arguments to layer's `call`.
self.arguments = arguments
# Add nodes to all layers involved.
for layer in inbound_layers:
if layer is not None:
layer._outbound_nodes.append(self)
outbound_layer._inbound_nodes.append(self)
def get_config(self):
inbound_names = []
for layer in self.inbound_layers:
if layer:
inbound_names.append(layer.name)
else:
inbound_names.append(None)
if self.outbound_layer:
outbound_layer = self.outbound_layer.name
else:
outbound_layer = None
return {'outbound_layer': outbound_layer,
'inbound_layers': inbound_names,
'node_indices': self.node_indices,
'tensor_indices': self.tensor_indices}
Node
類代碼也都是賦值操作和獲取幾個屬性,這裏關鍵看下Layer
中調用的_add_inbound_node
方法。
def _add_inbound_node(self, input_tensors, output_tensors,
input_masks, output_masks,
input_shapes, output_shapes, arguments=None):
input_tensors = to_list(input_tensors)
output_tensors = to_list(output_tensors)
input_masks = to_list(input_masks)
output_masks = to_list(output_masks)
input_shapes = to_list(input_shapes)
output_shapes = to_list(output_shapes)
# Collect input tensor(s) coordinates.
inbound_layers = []
node_indices = []
tensor_indices = []
for x in input_tensors:
if hasattr(x, '_keras_history'):
inbound_layer, node_index, tensor_index = x._keras_history
inbound_layers.append(inbound_layer)
node_indices.append(node_index)
tensor_indices.append(tensor_index)
else:
inbound_layers.append(None)
node_indices.append(None)
tensor_indices.append(None)
# Create node, add it to inbound nodes.
Node(
self,
inbound_layers=inbound_layers,
node_indices=node_indices,
tensor_indices=tensor_indices,
input_tensors=input_tensors,
output_tensors=output_tensors,
input_masks=input_masks,
output_masks=output_masks,
input_shapes=input_shapes,
output_shapes=output_shapes,
arguments=arguments
)
# Update tensor history, _keras_shape and _uses_learning_phase.
for i in range(len(output_tensors)):
output_tensors[i]._keras_shape = output_shapes[i]
uses_lp = any(
[getattr(x, '_uses_learning_phase', False)
for x in input_tensors])
uses_lp = getattr(self, 'uses_learning_phase', False) or uses_lp
output_tensors[i]._uses_learning_phase = getattr(
output_tensors[i], '_uses_learning_phase', False) or uses_lp
output_tensors[i]._keras_history = (self,
len(self._inbound_nodes) - 1,
i)
首先確保輸入輸出的tensor、mask、shape都是list,如果不是的話就轉成list格式,然後遍歷input_tensors
獲取初始化Node
需要的參數,接下來就能創建Node,把不同的layer連在一起,最後把output_tensors
進行更新。
這些就是Layer
的全部內容了,下一篇博客會帶着大家分析下我們在定義模型時常會用到的Input