keras源碼分析之Layer

簡介

本文主要是對base_layer.py代碼的分析,該文件包含了最重要的Layer類代碼,keras所有的層都是Layer的父類,class Sequential(Model)繼承了keras/engine/training.py中的Model類,而Model類則繼承了同目錄下的keras/engine/topology.py中的Container類,Container類繼承了同文件中的Layer類,所以說Layer類就是keras的地基,承載着整個框架。

預備知識

在開始講解源碼之前,需要給大家再講一些預備知識,許多做ai的同學對python的瞭解可能沒有那麼深入,如果對這些預備知識不理解的話可能很難理解源碼

裝飾器

裝飾器是用來修改函數功能的,它能讓代碼變得更加簡潔,在Layer層中,採用了@property和一個自定義的裝飾器,@property類似Java中類的變量的getset方法,例如下面的源碼

get方法,添加@property即可

@property
def built(self):
    return self._built

set方法,添加@屬性名.setter

@built.setter
def built(self, value):
    self._built = value

另一個自定義的裝飾器是@interfaces.legacy_add_weight_support,這個主要是keras2的代碼對於1的兼容,這裏就不做過多的講解了。

magic函數

python類中有一個函數__call__叫做magic函數,實現了該方法的類的實例對象,可以直接以實例名作爲方法進行調用,這也是爲什麼我們能把每一層直接連接起來

inputs = Input(shape=(100))
x = Dense(64)(inputs) 

源碼分析

接下來我們正式來看Layer的源碼,其中截取了我認爲比較重要的內容,更多細節各位直接去看源碼吧,先從構造方法開始

 def __init__(self, **kwargs):
     self.input_spec = None
     self.supports_masking = False
     self.stateful = False

     # These properties will be set upon call of self.build()
     self._trainable_weights = []
     self._non_trainable_weights = []
     self._losses = []
     self._updates = []
     self._per_input_losses = {}
     self._per_input_updates = {}
     self._built = False

     # These lists will be filled via successive calls
     # to self._add_inbound_node().
     self._inbound_nodes = []
     self._outbound_nodes = []

     # These properties should be set by the user via keyword arguments.
     # note that 'dtype', 'input_shape' and 'batch_input_shape'
     # are only applicable to input layers: do not pass these keywords
     # to non-input layers.
     allowed_kwargs = {'input_shape',
                       'batch_input_shape',
                       'batch_size',
                       'dtype',
                       'name',
                       'trainable',
                       'weights',
                       'input_dtype',  # legacy
                       }
     for kwarg in kwargs:
         if kwarg not in allowed_kwargs:
             raise TypeError('Keyword argument not understood:', kwarg)
     name = kwargs.get('name')
     if not name:
         prefix = self.__class__.__name__
         name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))
     self.name = name

     self.trainable = kwargs.get('trainable', True)
     if 'input_shape' in kwargs or 'batch_input_shape' in kwargs:
         # In this case we will later create an input layer
         # to insert before the current layer
         if 'batch_input_shape' in kwargs:
             batch_input_shape = tuple(kwargs['batch_input_shape'])
         elif 'input_shape' in kwargs:
             batch_size = kwargs.get('batch_size')
             batch_input_shape = (
                 batch_size,) + tuple(kwargs['input_shape'])
         self.batch_input_shape = batch_input_shape

     # Set dtype.
     dtype = kwargs.get('dtype')
     if dtype is None:
         dtype = kwargs.get('input_dtype')
     if dtype is None:
         dtype = K.floatx()
     self.dtype = dtype

     self._initial_weights = kwargs.get('weights')

構造函數主要是參數的初始化和一些變量的賦值,其輸入參數在allowed_kwargs中,包括

  • input_shape,輸入維度
  • batch_input_shape,包括batch_size的輸入維度
  • batch_size,batch_size
  • dtype,數據類型
  • name,名字
  • trainable,是否訓練
  • weights,權重
  • input_dtype,輸入類型

其中dtype, input_shapebatch_input_shape是輸入層才需要輸入的參數,其它時候不要傳。

然後我們來看add_weight方法,該方法會給當前層添加需要訓練的權重。

def add_weight(self,
               name,
               shape,
               dtype=None,
               initializer=None,
               regularizer=None,
               trainable=True,
               constraint=None):
    initializer = initializers.get(initializer)
    if dtype is None:
        dtype = self.dtype
    weight = K.variable(initializer(shape, dtype=dtype),
                        dtype=dtype,
                        name=name,
                        constraint=constraint)
    if regularizer is not None:
        with K.name_scope('weight_regularizer'):
            self.add_loss(regularizer(weight))
    if trainable:
        self._trainable_weights.append(weight)
    else:
        self._non_trainable_weights.append(weight)
    return weight

其中K.variable()方法其實就是調用的tf的tf.Variable()方法,並判斷是否有正則項,調用add_loss方法把當前層的loss保存下來,然後根據trainable參數判斷是否是需要訓練的權重,並分別添加到需要訓練和不需要訓練的兩個列表中。

接下來我們來看最重要的magic函數__call__,這個函數比較長,我們分幾段來講

def __call__(self, inputs, **kwargs):
    if isinstance(inputs, list):
        inputs = inputs[:]
    with K.name_scope(self.name):
        # Handle laying building (weight creating, input spec locking).
        if not self.built:
            # Raise exceptions in case the input is not compatible
            # with the input_spec specified in the layer constructor.
            self.assert_input_compatibility(inputs)

            # Collect input shapes to build layer.
            input_shapes = []
            for x_elem in to_list(inputs):
                if hasattr(x_elem, '_keras_shape'):
                    input_shapes.append(x_elem._keras_shape)
                elif hasattr(K, 'int_shape'):
                    input_shapes.append(K.int_shape(x_elem))
                else:
                    raise ValueError('You tried to call layer "' +
                                     self.name +
                                     '". This layer has no information'
                                     ' about its expected input shape, '
                                     'and thus cannot be built. '
                                     'You can build it manually via: '
                                     '`layer.build(batch_input_shape)`')
            self.build(unpack_singleton(input_shapes))
            self.built = True

            # Load weights that were specified at layer instantiation.
            if self._initial_weights is not None:
                self.set_weights(self._initial_weights)

首先,根據構造方法中的name構造了一個scope,然後對built參數進行了判斷,如果是false,表示還沒有執行過build方法,那麼先判斷是否是可用的輸入值,然後取得輸入數據的shape並執行build方法,build方法是對網絡結構的構造,執行完build方法後會把built置爲true,並判斷一下權重是否需要初始化。如果我們要複用一個layer,此時的built參數是true,就不需要多次執行build方法了,這也是爲啥要構造scope的原因。

	   # Raise exceptions in case the input is not compatible
	   # with the input_spec set at build time.
	   self.assert_input_compatibility(inputs)
	
	   # Handle mask propagation.
	   previous_mask = _collect_previous_mask(inputs)
	   user_kwargs = kwargs.copy()
	   if not is_all_none(previous_mask):
	       # The previous layer generated a mask.
	       if has_arg(self.call, 'mask'):
	           if 'mask' not in kwargs:
	               # If mask is explicitly passed to __call__,
	               # we should override the default mask.
	               kwargs['mask'] = previous_mask
	   # Handle automatic shape inference (only useful for Theano).
	   input_shape = _collect_input_shape(inputs)
	
	   # Actually call the layer,
	   # collecting output(s), mask(s), and shape(s).
	   output = self.call(inputs, **kwargs)
	   output_mask = self.compute_mask(inputs, previous_mask)
	
	   # If the layer returns tensors from its inputs, unmodified,
	   # we copy them to avoid loss of tensor metadata.
	   output_ls = to_list(output)
	   inputs_ls = to_list(inputs)
	   output_ls_copy = []
	   for x in output_ls:
	       if x in inputs_ls:
	           x = K.identity(x)
	       output_ls_copy.append(x)
	   output = unpack_singleton(output_ls_copy)

這部分的代碼主要的內容是把輸入經過網絡後得到輸出結果,首先還是先判斷下是否是可用的輸入值,然後從輸入值從判斷下是否有mask,mask在後續的文章中會講解,如果有就取出來,然後調用call方法,得到輸出值,並更新mask,之後對輸入輸出值做一個比較,如果輸入值與輸出值是一樣的,則直接返回輸入值,防止丟失一些元數據,其中call方法就是網絡中tensor的一個變化過程。

 		# Inferring the output shape is only relevant for Theano.
        if all([s is not None
                for s in to_list(input_shape)]):
            output_shape = self.compute_output_shape(input_shape)
        else:
            if isinstance(input_shape, list):
                output_shape = [None for _ in input_shape]
            else:
                output_shape = None

        if (not isinstance(output_mask, (list, tuple)) and
                len(output_ls) > 1):
            # Augment the mask to match the length of the output.
            output_mask = [output_mask] * len(output_ls)

        # Add an inbound node to the layer, so that it keeps track
        # of the call and of all new variables created during the call.
        # This also updates the layer history of the output tensor(s).
        # If the input tensor(s) had not previous Keras history,
        # this does nothing.
        self._add_inbound_node(input_tensors=inputs,
                               output_tensors=output,
                               input_masks=previous_mask,
                               output_masks=output_mask,
                               input_shapes=input_shape,
                               output_shapes=output_shape,
                               arguments=user_kwargs)

        # Apply activity regularizer if any:
        if (hasattr(self, 'activity_regularizer') and
                self.activity_regularizer is not None):
            with K.name_scope('activity_regularizer'):
                regularization_losses = [
                    self.activity_regularizer(x)
                    for x in to_list(output)]
            self.add_loss(regularization_losses,
                          inputs=to_list(inputs))
    return output

最後這段代碼就是一些善後工作了,首先根據輸入shape得到輸出shape,也正是在此處調用了compute_output_shape方法,然後判斷下mask和輸出值維度是否相同,如果不同則把mask的維度進行一個擴展。然後調用了_add_inbound_node方法,該方法會創建一個node(node的作用下文會講),把不同的層連接起來,這樣當前的層才能拿到之前層的一些輸出值和mask。最後判斷下是否需要activity_regularizer,並添加到loss中去。kera中包括三種正則化,這裏簡單提一下

  • kernel_regularizer:對該層中的權值進行正則化,使其不至於過大。
  • bias_regularizer:與權值類似,限制該層中 biases 的大小。
  • activity_regularizer:對該層的輸出進行正則化。

除了Layer類外,這個py文件中還有兩個類,一個是InputSpec另一個是Node,我們再一起看下這兩個類是幹啥的。

class InputSpec(object):
    def __init__(self, dtype=None,
                 shape=None,
                 ndim=None,
                 max_ndim=None,
                 min_ndim=None,
                 axes=None):
        self.dtype = dtype
        self.shape = shape
        if shape is not None:
            self.ndim = len(shape)
        else:
            self.ndim = ndim
        self.max_ndim = max_ndim
        self.min_ndim = min_ndim
        self.axes = axes or {}

    def __repr__(self):
        spec = [('dtype=' + str(self.dtype)) if self.dtype else '',
                ('shape=' + str(self.shape)) if self.shape else '',
                ('ndim=' + str(self.ndim)) if self.ndim else '',
                ('max_ndim=' + str(self.max_ndim)) if self.max_ndim else '',
                ('min_ndim=' + str(self.min_ndim)) if self.min_ndim else '',
                ('axes=' + str(self.axes)) if self.axes else '']
        return 'InputSpec(%s)' % ', '.join(x for x in spec if x)

這個類主要是給layer來指定ndim、dtype、shape等參數的,簡單的賦值操作,沒有過多複雜的語句。

Node類是用來把兩個layer連接在一起的,好比一個數據通道,傳遞不同的layer之間需要用到的數據,爲什麼要有這個Node呢,我的理解是爲了解耦,Layer只關注其本身數據的處理,不關注數據的傳遞。回顧下Layer類,其中有兩個參數,self._inbound_nodesself._outbound_nodes,每次實例化Node的時候,就會向這兩個參數中添加Node對象。

class Node(object):
    def __init__(self, outbound_layer,
                 inbound_layers, node_indices, tensor_indices,
                 input_tensors, output_tensors,
                 input_masks, output_masks,
                 input_shapes, output_shapes,
                 arguments=None):
        # Layer instance (NOT a list).
        # this is the layer that takes a list of input tensors
        # and turns them into a list of output tensors.
        # the current node will be added to
        # the inbound_nodes of outbound_layer.
        self.outbound_layer = outbound_layer

        # The following 3 properties describe where
        # the input tensors come from: which layers,
        # and for each layer, which node and which
        # tensor output of each node.

        # List of layer instances.
        self.inbound_layers = inbound_layers
        # List of integers, 1:1 mapping with inbound_layers.
        self.node_indices = node_indices
        # List of integers, 1:1 mapping with inbound_layers.
        self.tensor_indices = tensor_indices

        # Following 2 properties:
        # tensor inputs and outputs of outbound_layer.

        # List of tensors. 1:1 mapping with inbound_layers.
        self.input_tensors = input_tensors
        # List of tensors, created by outbound_layer.call().
        self.output_tensors = output_tensors

        # Following 2 properties: input and output masks.
        # List of tensors, 1:1 mapping with input_tensor.
        self.input_masks = input_masks
        # List of tensors, created by outbound_layer.compute_mask().
        self.output_masks = output_masks

        # Following 2 properties: input and output shapes.

        # List of shape tuples, shapes of input_tensors.
        self.input_shapes = input_shapes
        # List of shape tuples, shapes of output_tensors.
        self.output_shapes = output_shapes

        # Optional keyword arguments to layer's `call`.
        self.arguments = arguments

        # Add nodes to all layers involved.
        for layer in inbound_layers:
            if layer is not None:
                layer._outbound_nodes.append(self)
        outbound_layer._inbound_nodes.append(self)

    def get_config(self):
        inbound_names = []
        for layer in self.inbound_layers:
            if layer:
                inbound_names.append(layer.name)
            else:
                inbound_names.append(None)
        if self.outbound_layer:
            outbound_layer = self.outbound_layer.name
        else:
            outbound_layer = None
        return {'outbound_layer': outbound_layer,
                'inbound_layers': inbound_names,
                'node_indices': self.node_indices,
                'tensor_indices': self.tensor_indices}

Node類代碼也都是賦值操作和獲取幾個屬性,這裏關鍵看下Layer中調用的_add_inbound_node方法。

    def _add_inbound_node(self, input_tensors, output_tensors,
                          input_masks, output_masks,
                          input_shapes, output_shapes, arguments=None):
                          
        input_tensors = to_list(input_tensors)
        output_tensors = to_list(output_tensors)
        input_masks = to_list(input_masks)
        output_masks = to_list(output_masks)
        input_shapes = to_list(input_shapes)
        output_shapes = to_list(output_shapes)

        # Collect input tensor(s) coordinates.
        inbound_layers = []
        node_indices = []
        tensor_indices = []
        for x in input_tensors:
            if hasattr(x, '_keras_history'):
                inbound_layer, node_index, tensor_index = x._keras_history
                inbound_layers.append(inbound_layer)
                node_indices.append(node_index)
                tensor_indices.append(tensor_index)
            else:
                inbound_layers.append(None)
                node_indices.append(None)
                tensor_indices.append(None)

        # Create node, add it to inbound nodes.
        Node(
            self,
            inbound_layers=inbound_layers,
            node_indices=node_indices,
            tensor_indices=tensor_indices,
            input_tensors=input_tensors,
            output_tensors=output_tensors,
            input_masks=input_masks,
            output_masks=output_masks,
            input_shapes=input_shapes,
            output_shapes=output_shapes,
            arguments=arguments
        )

        # Update tensor history, _keras_shape and _uses_learning_phase.
        for i in range(len(output_tensors)):
            output_tensors[i]._keras_shape = output_shapes[i]
            uses_lp = any(
                [getattr(x, '_uses_learning_phase', False)
                 for x in input_tensors])
            uses_lp = getattr(self, 'uses_learning_phase', False) or uses_lp
            output_tensors[i]._uses_learning_phase = getattr(
                output_tensors[i], '_uses_learning_phase', False) or uses_lp
            output_tensors[i]._keras_history = (self,
                                                len(self._inbound_nodes) - 1,
                                                i)

首先確保輸入輸出的tensor、mask、shape都是list,如果不是的話就轉成list格式,然後遍歷input_tensors獲取初始化Node需要的參數,接下來就能創建Node,把不同的layer連在一起,最後把output_tensors進行更新。

這些就是Layer的全部內容了,下一篇博客會帶着大家分析下我們在定義模型時常會用到的Input

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章