現在MVC架構開發中。m部分是提供底層數據。無論是cs還是bs越來越看重數據對象的業務處理,而不是以前原生的sql得到的數據。
因此,1個通用的關係數據庫字段對應對模型對象的框架就比較重要了。有了他可以節省大量的開發時間。
本篇側重簡要分析django中的orm。
對於orm,既然是通用,那麼就存在5個重要問題。
1:如何多數據庫的支持
2:對象字段類型的提供
3:sql->object如何實現轉換
4:object->sql如何存儲
5:數據庫字段關聯關係在對象中的對應。
11分析如上幾個問題。
1:對於多數據庫支持,這個問題比較好解決,因爲sql通用的操作代碼比較多。差別在細微和一些數據庫特性。數據庫聯接上。基於分層思想。
高層封裝sql通用方法類。特性方法。和數據庫操作方法類。在具體對應的數據庫實現文件中針對性改寫。
根據猜想,分析django.db.backends模塊下文件。得到
backends
….
|----sqlite3
|----__init__.py
|----base.py 基於sqlite3構造繼承數據庫基類。填充sqlte3特性,完善sqlte3的運算操作與遊標
|----client.py 本地執行sqlite3 構造db文件
|----creation.py 提供構造數據庫的model與數據庫字段對應關係,填充創建方法
|----introspection.py 提供自省對應的model與數據庫字段關係,填充自省方法
__init__.py 數據庫基礎,特性,運算操作,自省,本地執行,驗證 等基類封裝
creation.py 構造數據庫基類封裝
signals.py 數據庫構造信號
util.py 遊標,調試遊標(記錄時間)封裝
問題2:對象字段類型的提供。
sql語句與iphone的視圖與android的控件一樣。都是1個提供比較同樣方法或者屬性的集合。細微差別在子類實現時候差異化。
對於sql提供的字段,有整形,布爾型,字符,浮點,日期等。。。
這段比較不熟悉,因此分析models的模塊得到如下目錄
models
|----fields
|----__init__.py 對象的基類與對象類
|----fies.py 文件對象
|----proxy.py 代理類,暫時還不知道用途
|----related.py 關聯關係(1對多,多對多,反向,主見,1對1)
|----subclassing.py 暫時不知道
|----sql
|----__init__.py
|----aggregates.py 集合類,用於組合sql字段模版,重點是as_sql
|----complier.py sql語句組合類,最終查詢是通過此類實現
|----constants.py 常量
|----datastructures.py 暫時不清楚
|----expressions.py 評估程序?暫時不清楚
|----query.py 查詢
|----subqueries.py 增刪改查表達式(從Qeury文件中繼承)
|----where.py
__init__.py
aggregates.py 集合類,平均數,統計,最大,最小,stdDev,求和,方差
base.py 模型基類
constants.py
deletion.py 刪除的集合類collecor
expression.py 邏輯運算表達式
loading.py 加載 加載應用程序於模型,註冊模型,等方法的封裝
manager.py 對象管理類。對象管理描述符類
options.py 模型對象支持的選項
query.py 查詢集合類(基類,值類,值列表,時間,空)
query_utils.py 查詢包裝
related.py
signals.py 操作相關的信號定義
大致根據如上的代碼詳細查看源代碼會方便理解。
有了sql的連接管理,有了model提供字段,如何管理對象。。
從django的使用上可以看出。
模型.管理.操作()[切片處理]得到QuerySet。其中QuerySet封裝在models.query.py中。
下面就剩下3個問題。
3:sql->object,從QuerySet入手。分析iterator()
def iterator(self):
"""
An iterator over the results from applying this QuerySet to the
database.
"""
...
# Cache db and model outside the loop
db = self.db
model = self.model
compiler = self.query.get_compiler(using=db)
if fill_cache:
klass_info = get_klass_info(model, max_depth=max_depth,
requested=requested, only_load=only_load) 得到類信息
for row in compiler.results_iter(): 從數據庫遍歷字段
if fill_cache:
obj, _ = get_cached_row(row, index_start, db, klass_info,
offset=len(aggregate_select)) 獲取字段
else:
# Omit aggregates in object creation.
row_data = row[index_start:aggregate_start]
if skip:
obj = model_cls(**dict(zip(init_list, row_data)))
else:
obj = model(*row_data) 根據字段生成模型對象
# Store the source database of the object
obj._state.db = db
# This object came from the database; it's not being added.
obj._state.adding = False
if extra_select:
for i, k in enumerate(extra_select):
setattr(obj, k, row[i])
# Add the aggregates to the model
if aggregate_select:
for i, aggregate in enumerate(aggregate_select):
setattr(obj, aggregate, row[i + aggregate_start])
# Add the known related objects to the model, if there are any
if self._known_related_objects:
for field, rel_objs in self._known_related_objects.items():
pk = getattr(obj, field.get_attname())
try:
rel_obj = rel_objs[pk]
except KeyError:
pass # may happen in qs1 | qs2 scenarios
else:
setattr(obj, field.name, rel_obj)
yield obj
def get_cached_row(row, index_start, using, klass_info, offset=0):
"""
Helper function that recursively returns an object with the specified
related attributes already populated.
This method may be called recursively to populate deep select_related()
clauses.
Arguments:
* row - the row of data returned by the database cursor
* index_start - the index of the row at which data for this
object is known to start
* offset - the number of additional fields that are known to
exist in row for `klass`. This usually means the number of
annotated results on `klass`.
* using - the database alias on which the query is being executed.
* klass_info - result of the get_klass_info function
"""
if klass_info is None:
return None
klass, field_names, field_count, related_fields, reverse_related_fields, pk_idx = klass_info
fields = row[index_start : index_start + field_count]
# If the pk column is None (or the Oracle equivalent ''), then the related
# object must be non-existent - set the relation to None.
if fields[pk_idx] == None or fields[pk_idx] == '':
obj = None
elif field_names:
obj = klass(**dict(zip(field_names, fields)))
else:
obj = klass(*fields)生成對象
# If an object was retrieved, set the database state.
if obj:
obj._state.db = using
obj._state.adding = False
# Instantiate related fields
index_end = index_start + field_count + offset
# Iterate over each related object, populating any
# select_related() fields
for f, klass_info in related_fields:
# Recursively retrieve the data for the related object
cached_row = get_cached_row(row, index_end, using, klass_info)
# If the recursive descent found an object, populate the
# descriptor caches relevant to the object
if cached_row:
rel_obj, index_end = cached_row
if obj is not None:
# If the base object exists, populate the
# descriptor cache
setattr(obj, f.get_cache_name(), rel_obj)
if f.unique and rel_obj is not None:
# If the field is unique, populate the
# reverse descriptor cache on the related object
setattr(rel_obj, f.related.get_cache_name(), obj)
# Now do the same, but for reverse related objects.
# Only handle the restricted case - i.e., don't do a depth
# descent into reverse relations unless explicitly requested
for f, klass_info in reverse_related_fields:
# Recursively retrieve the data for the related object
cached_row = get_cached_row(row, index_end, using, klass_info)
# If the recursive descent found an object, populate the
# descriptor caches relevant to the object
if cached_row:
rel_obj, index_end = cached_row
if obj is not None:
# If the field is unique, populate the
# reverse descriptor cache
setattr(obj, f.related.get_cache_name(), rel_obj)
if rel_obj is not None:
# If the related object exists, populate
# the descriptor cache.
setattr(rel_obj, f.get_cache_name(), obj)
# Now populate all the non-local field values on the related
# object. If this object has deferred fields, we need to use
# the opts from the original model to get non-local fields
# correctly.
opts = rel_obj._meta
if getattr(rel_obj, '_deferred'):
opts = opts.proxy_for_model._meta
for rel_field, rel_model in opts.get_fields_with_model():
if rel_model is not None:
setattr(rel_obj, rel_field.attname, getattr(obj, rel_field.attname))
# populate the field cache for any related object
# that has already been retrieved
if rel_field.rel:
try:
cached_obj = getattr(obj, rel_field.get_cache_name())
setattr(rel_obj, rel_field.get_cache_name(), cached_obj)
except AttributeError:
# Related object hasn't been cached yet
pass
return obj, index_end
以上是簡要分析。
下面還有如何把object->sql
object.save()方法,執行以後會將數據存儲到db。因此分析model基類的save方法。分析到save_base方法
def save_base(self, raw=False, cls=None, origin=None, force_insert=False,
force_update=False, using=None, update_fields=None):
...
manager = cls._base_manager
...
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)這裏可以看出是插入部分
manager內部
def _insert(self, objs, fields, **kwargs):
return insert_query(self.model, objs, fields, **kwargs)
insert_query中
def insert_query(model, objs, fields, return_id=False, raw=False, using=None):
query = sql.InsertQuery(model)生成插入查詢對象
query.insert_values(fields, objs, raw=raw)插入值
return query.get_compiler(using=using).execute_sql(return_id)獲得生成sql對象,並執行sql語句
插入值部分
def insert_values(self, fields, objs, raw=False):
self.fields = fields
# Check that no Promise object reaches the DB. Refs #10498.
for field in fields:
for obj in objs:
value = getattr(obj, field.attname)
if isinstance(value, Promise):
setattr(obj, field.attname, force_text(value))
self.objs = objs
self.raw = raw
獲取sql編譯對象上面已經說過。
查看執行語句
def execute_sql(self, return_id=False):
assert not (return_id and len(self.query.objs) != 1)
self.return_id = return_id
cursor = self.connection.cursor()
for sql, params in self.as_sql():
cursor.execute(sql, params)
if not (return_id and cursor):
return
if self.connection.features.can_return_id_from_insert:
return self.connection.ops.fetch_returned_insert_id(cursor)
return self.connection.ops.last_insert_id(cursor,
self.query.model._meta.db_table, self.query.model._meta.pk.column)
看到這裏,看as_sql()方法
def as_sql(self):
qn = self.connection.ops.quote_name
opts = self.query.model._meta
result = ['INSERT INTO %s' % qn(opts.db_table)]
has_fields = bool(self.query.fields)
fields = self.query.fields if has_fields else [opts.pk]
result.append('(%s)' % ', '.join([qn(f.column) for f in fields]))
if has_fields:
params = values = [
[
f.get_db_prep_save(getattr(obj, f.attname) if self.query.raw else f.pre_save(obj, True), connection=self.connection)
for f in fields
]
for obj in self.query.objs
]
else:
values = [[self.connection.ops.pk_default_value()] for obj in self.query.objs]
params = [[]]
fields = [None]
can_bulk = (not any(hasattr(field, "get_placeholder") for field in fields) and
not self.return_id and self.connection.features.has_bulk_insert)
if can_bulk:
placeholders = [["%s"] * len(fields)]
else:
placeholders = [
[self.placeholder(field, v) for field, v in zip(fields, val)]
for val in values
]
# Oracle Spatial needs to remove some values due to #10888
params = self.connection.ops.modify_insert_params(placeholders, params)
if self.return_id and self.connection.features.can_return_id_from_insert:
params = params[0]
col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
result.append("VALUES (%s)" % ", ".join(placeholders[0]))
r_fmt, r_params = self.connection.ops.return_insert_id()
# Skip empty r_fmt to allow subclasses to customize behaviour for
# 3rd party backends. Refs #19096.
if r_fmt:
result.append(r_fmt % col)
params += r_params
return [(" ".join(result), tuple(params))]
if can_bulk:
result.append(self.connection.ops.bulk_insert_sql(fields, len(values)))
return [(" ".join(result), tuple([v for val in values for v in val]))]
else:
return [
(" ".join(result + ["VALUES (%s)" % ", ".join(p)]), vals)
for p, vals in zip(placeholders, params)
]
上面看到很明顯的組合sql語句。
因此object->sql部分邏輯是
object->得到表名,字段名,和值
格式化到 insert into 表 (字段) values (值) 並執行
最後1個問題。。有點複雜的說。要深入學習下。