深入學習Django源碼基礎9 - 簡單分析DjangoORM部分

現在MVC架構開發中。m部分是提供底層數據。無論是cs還是bs越來越看重數據對象的業務處理，而不是以前原生的sql得到的數據。

因此，1個通用的關係數據庫字段對應對模型對象的框架就比較重要了。有了他可以節省大量的開發時間。

本篇側重簡要分析django中的orm。

對於orm，既然是通用，那麼就存在5個重要問題。

1：如何多數據庫的支持

2：對象字段類型的提供

3：sql->object如何實現轉換

4：object->sql如何存儲

5：數據庫字段關聯關係在對象中的對應。

11分析如上幾個問題。

1：對於多數據庫支持，這個問題比較好解決，因爲sql通用的操作代碼比較多。差別在細微和一些數據庫特性。數據庫聯接上。基於分層思想。

高層封裝sql通用方法類。特性方法。和數據庫操作方法類。在具體對應的數據庫實現文件中針對性改寫。

根據猜想，分析django.db.backends模塊下文件。得到

backends
….
|----sqlite3
     |----__init__.py
     |----base.py				基於sqlite3構造繼承數據庫基類。填充sqlte3特性，完善sqlte3的運算操作與遊標
     |----client.py				本地執行sqlite3 構造db文件
     |----creation.py			提供構造數據庫的model與數據庫字段對應關係，填充創建方法
     |----introspection.py		提供自省對應的model與數據庫字段關係，填充自省方法
__init__.py				數據庫基礎，特性，運算操作，自省，本地執行，驗證 等基類封裝
creation.py					構造數據庫基類封裝
signals.py					數據庫構造信號
util.py					遊標，調試遊標（記錄時間）封裝

問題2：對象字段類型的提供。

sql語句與iphone的視圖與android的控件一樣。都是1個提供比較同樣方法或者屬性的集合。細微差別在子類實現時候差異化。

對於sql提供的字段，有整形，布爾型，字符，浮點，日期等。。。

這段比較不熟悉，因此分析models的模塊得到如下目錄

models
|----fields
     |----__init__.py			對象的基類與對象類
     |----fies.py				文件對象
     |----proxy.py				代理類，暫時還不知道用途
     |----related.py				關聯關係（1對多，多對多，反向，主見，1對1）
     |----subclassing.py			暫時不知道
|----sql
     |----__init__.py
     |----aggregates.py			集合類，用於組合sql字段模版，重點是as_sql
     |----complier.py			sql語句組合類，最終查詢是通過此類實現    
     |----constants.py			常量
     |----datastructures.py		暫時不清楚
     |----expressions.py			評估程序？暫時不清楚
     |----query.py				查詢
     |----subqueries.py			增刪改查表達式（從Qeury文件中繼承）
     |----where.py				
__init__.py
aggregates.py				集合類，平均數，統計，最大，最小，stdDev，求和，方差
base.py					模型基類
constants.py				
deletion.py					刪除的集合類collecor
expression.py				邏輯運算表達式
loading.py					加載 加載應用程序於模型，註冊模型，等方法的封裝
manager.py				對象管理類。對象管理描述符類
options.py					模型對象支持的選項
query.py					查詢集合類（基類，值類，值列表，時間，空）
query_utils.py				查詢包裝
related.py
signals.py					操作相關的信號定義

大致根據如上的代碼詳細查看源代碼會方便理解。

有了sql的連接管理，有了model提供字段，如何管理對象。。

從django的使用上可以看出。

模型.管理.操作()[切片處理]得到QuerySet。其中QuerySet封裝在models.query.py中。

下面就剩下3個問題。

3：sql->object，從QuerySet入手。分析iterator()

def iterator(self):
        """
        An iterator over the results from applying this QuerySet to the
        database.
        """
        ...

        # Cache db and model outside the loop
        db = self.db
        model = self.model
        compiler = self.query.get_compiler(using=db)
        if fill_cache:
            klass_info = get_klass_info(model, max_depth=max_depth,
                                        requested=requested, only_load=only_load) 得到類信息
        for row in compiler.results_iter(): 從數據庫遍歷字段
            if fill_cache:
                obj, _ = get_cached_row(row, index_start, db, klass_info,
                                        offset=len(aggregate_select)) 獲取字段
            else:
                # Omit aggregates in object creation.
                row_data = row[index_start:aggregate_start]
                if skip:
                    obj = model_cls(**dict(zip(init_list, row_data)))
                else:
                    obj = model(*row_data) 根據字段生成模型對象

                # Store the source database of the object
                obj._state.db = db
                # This object came from the database; it's not being added.
                obj._state.adding = False

            if extra_select:
                for i, k in enumerate(extra_select):
                    setattr(obj, k, row[i])

            # Add the aggregates to the model
            if aggregate_select:
                for i, aggregate in enumerate(aggregate_select):
                    setattr(obj, aggregate, row[i + aggregate_start])

            # Add the known related objects to the model, if there are any
            if self._known_related_objects:
                for field, rel_objs in self._known_related_objects.items():
                    pk = getattr(obj, field.get_attname())
                    try:
                        rel_obj = rel_objs[pk]
                    except KeyError:
                        pass               # may happen in qs1 | qs2 scenarios
                    else:
                        setattr(obj, field.name, rel_obj)

            yield obj

def get_cached_row(row, index_start, using,  klass_info, offset=0):
    """
    Helper function that recursively returns an object with the specified
    related attributes already populated.

    This method may be called recursively to populate deep select_related()
    clauses.

    Arguments:
         * row - the row of data returned by the database cursor
         * index_start - the index of the row at which data for this
           object is known to start
         * offset - the number of additional fields that are known to
           exist in row for `klass`. This usually means the number of
           annotated results on `klass`.
        * using - the database alias on which the query is being executed.
         * klass_info - result of the get_klass_info function
    """
    if klass_info is None:
        return None
    klass, field_names, field_count, related_fields, reverse_related_fields, pk_idx = klass_info

    fields = row[index_start : index_start + field_count]
    # If the pk column is None (or the Oracle equivalent ''), then the related
    # object must be non-existent - set the relation to None.
    if fields[pk_idx] == None or fields[pk_idx] == '':
        obj = None
    elif field_names:
        obj = klass(**dict(zip(field_names, fields)))
    else:
        obj = klass(*fields)生成對象

    # If an object was retrieved, set the database state.
    if obj:
        obj._state.db = using
        obj._state.adding = False

    # Instantiate related fields
    index_end = index_start + field_count + offset
    # Iterate over each related object, populating any
    # select_related() fields
    for f, klass_info in related_fields:
        # Recursively retrieve the data for the related object
        cached_row = get_cached_row(row, index_end, using, klass_info)
        # If the recursive descent found an object, populate the
        # descriptor caches relevant to the object
        if cached_row:
            rel_obj, index_end = cached_row
            if obj is not None:
                # If the base object exists, populate the
                # descriptor cache
                setattr(obj, f.get_cache_name(), rel_obj)
            if f.unique and rel_obj is not None:
                # If the field is unique, populate the
                # reverse descriptor cache on the related object
                setattr(rel_obj, f.related.get_cache_name(), obj)

    # Now do the same, but for reverse related objects.
    # Only handle the restricted case - i.e., don't do a depth
    # descent into reverse relations unless explicitly requested
    for f, klass_info in reverse_related_fields:
        # Recursively retrieve the data for the related object
        cached_row = get_cached_row(row, index_end, using, klass_info)
        # If the recursive descent found an object, populate the
        # descriptor caches relevant to the object
        if cached_row:
            rel_obj, index_end = cached_row
            if obj is not None:
                # If the field is unique, populate the
                # reverse descriptor cache
                setattr(obj, f.related.get_cache_name(), rel_obj)
            if rel_obj is not None:
                # If the related object exists, populate
                # the descriptor cache.
                setattr(rel_obj, f.get_cache_name(), obj)
                # Now populate all the non-local field values on the related
                # object. If this object has deferred fields, we need to use
                # the opts from the original model to get non-local fields
                # correctly.
                opts = rel_obj._meta
                if getattr(rel_obj, '_deferred'):
                    opts = opts.proxy_for_model._meta
                for rel_field, rel_model in opts.get_fields_with_model():
                    if rel_model is not None:
                        setattr(rel_obj, rel_field.attname, getattr(obj, rel_field.attname))
                        # populate the field cache for any related object
                        # that has already been retrieved
                        if rel_field.rel:
                            try:
                                cached_obj = getattr(obj, rel_field.get_cache_name())
                                setattr(rel_obj, rel_field.get_cache_name(), cached_obj)
                            except AttributeError:
                                # Related object hasn't been cached yet
                                pass
    return obj, index_end

以上是簡要分析。

下面還有如何把object->sql

object.save()方法，執行以後會將數據存儲到db。因此分析model基類的save方法。分析到save_base方法

def save_base(self, raw=False, cls=None, origin=None, force_insert=False,
                  force_update=False, using=None, update_fields=None):
...
manager = cls._base_manager
...
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)這裏可以看出是插入部分

manager內部

def _insert(self, objs, fields, **kwargs):
        return insert_query(self.model, objs, fields, **kwargs)

insert_query中

def insert_query(model, objs, fields, return_id=False, raw=False, using=None):
    query = sql.InsertQuery(model)生成插入查詢對象
    query.insert_values(fields, objs, raw=raw)插入值
    return query.get_compiler(using=using).execute_sql(return_id)獲得生成sql對象，並執行sql語句

插入值部分

    def insert_values(self, fields, objs, raw=False):
        self.fields = fields
        # Check that no Promise object reaches the DB. Refs #10498.
        for field in fields:
            for obj in objs:
                value = getattr(obj, field.attname)
                if isinstance(value, Promise):
                    setattr(obj, field.attname, force_text(value))
        self.objs = objs
        self.raw = raw

獲取sql編譯對象上面已經說過。

查看執行語句

    def execute_sql(self, return_id=False):
        assert not (return_id and len(self.query.objs) != 1)
        self.return_id = return_id
        cursor = self.connection.cursor()
        for sql, params in self.as_sql():
            cursor.execute(sql, params)
        if not (return_id and cursor):
            return
        if self.connection.features.can_return_id_from_insert:
            return self.connection.ops.fetch_returned_insert_id(cursor)
        return self.connection.ops.last_insert_id(cursor,
                self.query.model._meta.db_table, self.query.model._meta.pk.column)

看到這裏，看as_sql()方法

    def as_sql(self):
        qn = self.connection.ops.quote_name
        opts = self.query.model._meta
        result = ['INSERT INTO %s' % qn(opts.db_table)]

        has_fields = bool(self.query.fields)
        fields = self.query.fields if has_fields else [opts.pk]
        result.append('(%s)' % ', '.join([qn(f.column) for f in fields]))

        if has_fields:
            params = values = [
                [
                    f.get_db_prep_save(getattr(obj, f.attname) if self.query.raw else f.pre_save(obj, True), connection=self.connection)
                    for f in fields
                ]
                for obj in self.query.objs
            ]
        else:
            values = [[self.connection.ops.pk_default_value()] for obj in self.query.objs]
            params = [[]]
            fields = [None]
        can_bulk = (not any(hasattr(field, "get_placeholder") for field in fields) and
            not self.return_id and self.connection.features.has_bulk_insert)

        if can_bulk:
            placeholders = [["%s"] * len(fields)]
        else:
            placeholders = [
                [self.placeholder(field, v) for field, v in zip(fields, val)]
                for val in values
            ]
            # Oracle Spatial needs to remove some values due to #10888
            params = self.connection.ops.modify_insert_params(placeholders, params)
        if self.return_id and self.connection.features.can_return_id_from_insert:
            params = params[0]
            col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
            result.append("VALUES (%s)" % ", ".join(placeholders[0]))
            r_fmt, r_params = self.connection.ops.return_insert_id()
            # Skip empty r_fmt to allow subclasses to customize behaviour for
            # 3rd party backends. Refs #19096.
            if r_fmt:
                result.append(r_fmt % col)
                params += r_params
            return [(" ".join(result), tuple(params))]
        if can_bulk:
            result.append(self.connection.ops.bulk_insert_sql(fields, len(values)))
            return [(" ".join(result), tuple([v for val in values for v in val]))]
        else:
            return [
                (" ".join(result + ["VALUES (%s)" % ", ".join(p)]), vals)
                for p, vals in zip(placeholders, params)
            ]

上面看到很明顯的組合sql語句。

因此object->sql部分邏輯是

object->得到表名，字段名，和值

格式化到 insert into 表 (字段) values (值) 並執行

最後1個問題。。有點複雜的說。要深入學習下。

深入學習Django源碼基礎9 - 簡單分析DjangoORM部分

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

全面系統的AI學習路徑，幫助普通人也能玩轉AI

HTML 00 Tutorial

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

vue3編譯優化之“靜態提升”

又是一個月-20240513

flask 如何保證返回json有序

深入學習Django源碼基礎9 - 簡單分析DjangoORM部分

深入學習Django源碼基礎5 - utils中archive技巧

深入學習Django源碼基礎8 - Django中系統級國際化本地化

數據標準化基礎及說明

深入學習Django源碼基礎3 - python提供的對象默認方法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結