深入学习Django源码基础9 - 简单分析DjangoORM部分

现在MVC架构开发中。m部分是提供底层数据。无论是cs还是bs越来越看重数据对象的业务处理，而不是以前原生的sql得到的数据。

因此，1个通用的关系数据库字段对应对模型对象的框架就比较重要了。有了他可以节省大量的开发时间。

本篇侧重简要分析django中的orm。

对于orm，既然是通用，那么就存在5个重要问题。

1：如何多数据库的支持

2：对象字段类型的提供

3：sql->object如何实现转换

4：object->sql如何存储

5：数据库字段关联关系在对象中的对应。

11分析如上几个问题。

1：对于多数据库支持，这个问题比较好解决，因为sql通用的操作代码比较多。差别在细微和一些数据库特性。数据库联接上。基于分层思想。

高层封装sql通用方法类。特性方法。和数据库操作方法类。在具体对应的数据库实现文件中针对性改写。

根据猜想，分析django.db.backends模块下文件。得到

backends
….
|----sqlite3
     |----__init__.py
     |----base.py				基于sqlite3构造继承数据库基类。填充sqlte3特性，完善sqlte3的运算操作与游标
     |----client.py				本地执行sqlite3 构造db文件
     |----creation.py			提供构造数据库的model与数据库字段对应关系，填充创建方法
     |----introspection.py		提供自省对应的model与数据库字段关系，填充自省方法
__init__.py				数据库基础，特性，运算操作，自省，本地执行，验证 等基类封装
creation.py					构造数据库基类封装
signals.py					数据库构造信号
util.py					游标，调试游标（记录时间）封装

问题2：对象字段类型的提供。

sql语句与iphone的视图与android的控件一样。都是1个提供比较同样方法或者属性的集合。细微差别在子类实现时候差异化。

对于sql提供的字段，有整形，布尔型，字符，浮点，日期等。。。

这段比较不熟悉，因此分析models的模块得到如下目录

models
|----fields
     |----__init__.py			对象的基类与对象类
     |----fies.py				文件对象
     |----proxy.py				代理类，暂时还不知道用途
     |----related.py				关联关系（1对多，多对多，反向，主见，1对1）
     |----subclassing.py			暂时不知道
|----sql
     |----__init__.py
     |----aggregates.py			集合类，用于组合sql字段模版，重点是as_sql
     |----complier.py			sql语句组合类，最终查询是通过此类实现    
     |----constants.py			常量
     |----datastructures.py		暂时不清楚
     |----expressions.py			评估程序？暂时不清楚
     |----query.py				查询
     |----subqueries.py			增删改查表达式（从Qeury文件中继承）
     |----where.py				
__init__.py
aggregates.py				集合类，平均数，统计，最大，最小，stdDev，求和，方差
base.py					模型基类
constants.py				
deletion.py					删除的集合类collecor
expression.py				逻辑运算表达式
loading.py					加载 加载应用程序于模型，注册模型，等方法的封装
manager.py				对象管理类。对象管理描述符类
options.py					模型对象支持的选项
query.py					查询集合类（基类，值类，值列表，时间，空）
query_utils.py				查询包装
related.py
signals.py					操作相关的信号定义

大致根据如上的代码详细查看源代码会方便理解。

有了sql的连接管理，有了model提供字段，如何管理对象。。

从django的使用上可以看出。

模型.管理.操作()[切片处理]得到QuerySet。其中QuerySet封装在models.query.py中。

下面就剩下3个问题。

3：sql->object，从QuerySet入手。分析iterator()

def iterator(self):
        """
        An iterator over the results from applying this QuerySet to the
        database.
        """
        ...

        # Cache db and model outside the loop
        db = self.db
        model = self.model
        compiler = self.query.get_compiler(using=db)
        if fill_cache:
            klass_info = get_klass_info(model, max_depth=max_depth,
                                        requested=requested, only_load=only_load) 得到类信息
        for row in compiler.results_iter(): 从数据库遍历字段
            if fill_cache:
                obj, _ = get_cached_row(row, index_start, db, klass_info,
                                        offset=len(aggregate_select)) 获取字段
            else:
                # Omit aggregates in object creation.
                row_data = row[index_start:aggregate_start]
                if skip:
                    obj = model_cls(**dict(zip(init_list, row_data)))
                else:
                    obj = model(*row_data) 根据字段生成模型对象

                # Store the source database of the object
                obj._state.db = db
                # This object came from the database; it's not being added.
                obj._state.adding = False

            if extra_select:
                for i, k in enumerate(extra_select):
                    setattr(obj, k, row[i])

            # Add the aggregates to the model
            if aggregate_select:
                for i, aggregate in enumerate(aggregate_select):
                    setattr(obj, aggregate, row[i + aggregate_start])

            # Add the known related objects to the model, if there are any
            if self._known_related_objects:
                for field, rel_objs in self._known_related_objects.items():
                    pk = getattr(obj, field.get_attname())
                    try:
                        rel_obj = rel_objs[pk]
                    except KeyError:
                        pass               # may happen in qs1 | qs2 scenarios
                    else:
                        setattr(obj, field.name, rel_obj)

            yield obj

def get_cached_row(row, index_start, using,  klass_info, offset=0):
    """
    Helper function that recursively returns an object with the specified
    related attributes already populated.

    This method may be called recursively to populate deep select_related()
    clauses.

    Arguments:
         * row - the row of data returned by the database cursor
         * index_start - the index of the row at which data for this
           object is known to start
         * offset - the number of additional fields that are known to
           exist in row for `klass`. This usually means the number of
           annotated results on `klass`.
        * using - the database alias on which the query is being executed.
         * klass_info - result of the get_klass_info function
    """
    if klass_info is None:
        return None
    klass, field_names, field_count, related_fields, reverse_related_fields, pk_idx = klass_info

    fields = row[index_start : index_start + field_count]
    # If the pk column is None (or the Oracle equivalent ''), then the related
    # object must be non-existent - set the relation to None.
    if fields[pk_idx] == None or fields[pk_idx] == '':
        obj = None
    elif field_names:
        obj = klass(**dict(zip(field_names, fields)))
    else:
        obj = klass(*fields)生成对象

    # If an object was retrieved, set the database state.
    if obj:
        obj._state.db = using
        obj._state.adding = False

    # Instantiate related fields
    index_end = index_start + field_count + offset
    # Iterate over each related object, populating any
    # select_related() fields
    for f, klass_info in related_fields:
        # Recursively retrieve the data for the related object
        cached_row = get_cached_row(row, index_end, using, klass_info)
        # If the recursive descent found an object, populate the
        # descriptor caches relevant to the object
        if cached_row:
            rel_obj, index_end = cached_row
            if obj is not None:
                # If the base object exists, populate the
                # descriptor cache
                setattr(obj, f.get_cache_name(), rel_obj)
            if f.unique and rel_obj is not None:
                # If the field is unique, populate the
                # reverse descriptor cache on the related object
                setattr(rel_obj, f.related.get_cache_name(), obj)

    # Now do the same, but for reverse related objects.
    # Only handle the restricted case - i.e., don't do a depth
    # descent into reverse relations unless explicitly requested
    for f, klass_info in reverse_related_fields:
        # Recursively retrieve the data for the related object
        cached_row = get_cached_row(row, index_end, using, klass_info)
        # If the recursive descent found an object, populate the
        # descriptor caches relevant to the object
        if cached_row:
            rel_obj, index_end = cached_row
            if obj is not None:
                # If the field is unique, populate the
                # reverse descriptor cache
                setattr(obj, f.related.get_cache_name(), rel_obj)
            if rel_obj is not None:
                # If the related object exists, populate
                # the descriptor cache.
                setattr(rel_obj, f.get_cache_name(), obj)
                # Now populate all the non-local field values on the related
                # object. If this object has deferred fields, we need to use
                # the opts from the original model to get non-local fields
                # correctly.
                opts = rel_obj._meta
                if getattr(rel_obj, '_deferred'):
                    opts = opts.proxy_for_model._meta
                for rel_field, rel_model in opts.get_fields_with_model():
                    if rel_model is not None:
                        setattr(rel_obj, rel_field.attname, getattr(obj, rel_field.attname))
                        # populate the field cache for any related object
                        # that has already been retrieved
                        if rel_field.rel:
                            try:
                                cached_obj = getattr(obj, rel_field.get_cache_name())
                                setattr(rel_obj, rel_field.get_cache_name(), cached_obj)
                            except AttributeError:
                                # Related object hasn't been cached yet
                                pass
    return obj, index_end

以上是简要分析。

下面还有如何把object->sql

object.save()方法，执行以后会将数据存储到db。因此分析model基类的save方法。分析到save_base方法

def save_base(self, raw=False, cls=None, origin=None, force_insert=False,
                  force_update=False, using=None, update_fields=None):
...
manager = cls._base_manager
...
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)这里可以看出是插入部分

manager内部

def _insert(self, objs, fields, **kwargs):
        return insert_query(self.model, objs, fields, **kwargs)

insert_query中

def insert_query(model, objs, fields, return_id=False, raw=False, using=None):
    query = sql.InsertQuery(model)生成插入查询对象
    query.insert_values(fields, objs, raw=raw)插入值
    return query.get_compiler(using=using).execute_sql(return_id)获得生成sql对象，并执行sql语句

插入值部分

    def insert_values(self, fields, objs, raw=False):
        self.fields = fields
        # Check that no Promise object reaches the DB. Refs #10498.
        for field in fields:
            for obj in objs:
                value = getattr(obj, field.attname)
                if isinstance(value, Promise):
                    setattr(obj, field.attname, force_text(value))
        self.objs = objs
        self.raw = raw

获取sql编译对象上面已经说过。

查看执行语句

    def execute_sql(self, return_id=False):
        assert not (return_id and len(self.query.objs) != 1)
        self.return_id = return_id
        cursor = self.connection.cursor()
        for sql, params in self.as_sql():
            cursor.execute(sql, params)
        if not (return_id and cursor):
            return
        if self.connection.features.can_return_id_from_insert:
            return self.connection.ops.fetch_returned_insert_id(cursor)
        return self.connection.ops.last_insert_id(cursor,
                self.query.model._meta.db_table, self.query.model._meta.pk.column)

看到这里，看as_sql()方法

    def as_sql(self):
        qn = self.connection.ops.quote_name
        opts = self.query.model._meta
        result = ['INSERT INTO %s' % qn(opts.db_table)]

        has_fields = bool(self.query.fields)
        fields = self.query.fields if has_fields else [opts.pk]
        result.append('(%s)' % ', '.join([qn(f.column) for f in fields]))

        if has_fields:
            params = values = [
                [
                    f.get_db_prep_save(getattr(obj, f.attname) if self.query.raw else f.pre_save(obj, True), connection=self.connection)
                    for f in fields
                ]
                for obj in self.query.objs
            ]
        else:
            values = [[self.connection.ops.pk_default_value()] for obj in self.query.objs]
            params = [[]]
            fields = [None]
        can_bulk = (not any(hasattr(field, "get_placeholder") for field in fields) and
            not self.return_id and self.connection.features.has_bulk_insert)

        if can_bulk:
            placeholders = [["%s"] * len(fields)]
        else:
            placeholders = [
                [self.placeholder(field, v) for field, v in zip(fields, val)]
                for val in values
            ]
            # Oracle Spatial needs to remove some values due to #10888
            params = self.connection.ops.modify_insert_params(placeholders, params)
        if self.return_id and self.connection.features.can_return_id_from_insert:
            params = params[0]
            col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
            result.append("VALUES (%s)" % ", ".join(placeholders[0]))
            r_fmt, r_params = self.connection.ops.return_insert_id()
            # Skip empty r_fmt to allow subclasses to customize behaviour for
            # 3rd party backends. Refs #19096.
            if r_fmt:
                result.append(r_fmt % col)
                params += r_params
            return [(" ".join(result), tuple(params))]
        if can_bulk:
            result.append(self.connection.ops.bulk_insert_sql(fields, len(values)))
            return [(" ".join(result), tuple([v for val in values for v in val]))]
        else:
            return [
                (" ".join(result + ["VALUES (%s)" % ", ".join(p)]), vals)
                for p, vals in zip(placeholders, params)
            ]

上面看到很明显的组合sql语句。

因此object->sql部分逻辑是

object->得到表名，字段名，和值

格式化到 insert into 表 (字段) values (值) 并执行

最后1个问题。。有点复杂的说。要深入学习下。

深入学习Django源码基础9 - 简单分析DjangoORM部分

深入學習Django源碼基礎9 - 簡單分析DjangoORM部分

深入學習Django源碼基礎5 - utils中archive技巧

深入學習Django源碼基礎8 - Django中系統級國際化本地化

數據標準化基礎及說明

深入學習Django源碼基礎3 - python提供的對象默認方法

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結