本文介紹 mongodb 的基本使用,常用操作.主要講 pymongo 的使用, 同時必要的時候會說一些 源碼的 以及注意事項.
涉及主要說了一些常見的問題, monggodb 中經常用過的查詢操作.
- and or 用法
- 排序操作
- 工具類
- in 查詢
- skip ,offset 操作
- cursor 介紹
- - 遇到錯誤 相關錯誤
1 根據mongo_id 查詢文檔
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymogo.py
@Author : [email protected]
按照 object_id 查詢 document
"""
from pymongo import MongoClient
# mongo URI 連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
# 導入這個包
from bson.objectid import ObjectId
# 通過uri 連接 pymongo 生成 client
client = MongoClient(SHOUFUYOU_REPORTING_URI)
# 獲取 db 通過名稱 獲取db .
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
# 獲取collection
call_record = mongo_db['callRecord']
if __name__ == '__main__':
# 查詢條件, 相當於 where 後面跟的條件
# filter_ = {'_id': ObjectId('5be2b43da90ec1470078ef53')}
filter_ = {'_id': ObjectId('5be2b43da90ec1470078ef50')}
# 過濾字段, 需要篩選出來 你想要的字段, 相當於 select 後面跟的 字段,
# 格式 '字段名':1 顯示, '字段名':0 不顯示. 默認 是都顯示出來, 如果指定了字段 則根據指定條件 新疆顯示.
projection = {'source_type': 1, '_id': 1}
# 根據mongo_id 查詢數據, 如果沒有返回 None
document = call_record.find_one(filter=filter_, projection=projection)
print(document)
#結果 {'_id': ObjectId('5be2b43da90ec1470078ef53'), 'source_type': 'android'}
通過 URI 連接 到mongodb, 之後獲取db, 最後 獲取collection 就可以了. 之後 就可以 find取查詢 數據庫了.
get_database參考這個文檔 http://api.mongodb.com/python/current/tutorial.html#getting-a-database
注意這裏用的 是 find_one 這個方法 只是用來查詢確定一條文檔,纔會使用. 一般 情況下 會使用 find 這個命令 會多一些.
舉個簡單的例子吧 .
cursor.find 的用法
find 的使用,在 mongodb 查詢 find 用的是最多的.
find 返回 結果是一個cursor , 如果沒有結果就會 None .
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymogo.py
@Author : [email protected]
find 基本用法 in
"""
from pymongo import MongoClient
# mongo URI 連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
# 導入這個包
from bson.objectid import ObjectId
client = MongoClient(SHOUFUYOU_REPORTING_URI)
# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
# 獲取collection
call_record = mongo_db['callRecord']
if __name__ == '__main__':
mongo_id_list_test = [
# 數據 mongo_id
ObjectId("5be2b43da90ec1470078ef53"),
ObjectId("5be3ec1da90ec146d71b551f"),
ObjectId("5be422eba90ec106a54840b2")
]
# mongodb in 查詢, 查詢條件
filter_ = {"_id": {"$in": mongo_id_list_test}}
# 篩選字段
projection = {
'_id': 1,
'created_time': 1,
}
# cursor 注意 find 並不會返回文檔, 而是返回一個cursor 對象
documents = call_record.find(filter_, projection)
print(f"documents:{documents}")
# 需要迭代對象,才能取到值.
for doc in documents:
print(doc)
結果如下:
documents:<pymongo.cursor.Cursor object at 0x10a60d9e8>
{'_id': ObjectId('5be2b43da90ec1470078ef53'), 'created_time': '2018-11-07 17:45:33'}
{'_id': ObjectId('5be3ec1da90ec146d71b551f'), 'created_time': '2018-11-08 15:56:13'}
{'_id': ObjectId('5be422eba90ec106a54840b2'), 'created_time': '2018-11-08 19:50:03'}
來說下 find 的參數. find 參數 還是挺多的.
這裏只說幾個比較重要的.
- filter 第一個位置 參數 就是篩選條件
- projection 第二個 位置參數 篩選字段
- no_cursor_timeout 判斷cursor 是否超時,默認是False ,永不超時
1 find 中and 的用法
and 語法 如下 :
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymogo.py
@Author : [email protected]
find 基本用法 and 用法
use xinyongfei_rcs_gateway;
db.getCollection("fudataLog").find(
{
"$and" : [
{
"status" : "SUCCESS"
},
{
"created_time" : {
"$gte" : "2018-05-22 17:18:45"
}
},
{
"created_time" : {
"$lt" : "2018-05-29 17:18:45"
}
},
{
"status" : "SUCCESS"
}
]
}
);
"""
from pymongo import MongoClient
# mongo URI 連接配置
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME
def test_mongo_between():
"""
{ $and: [ { "created_time": { $gte: "2018-05-22 16:31:05" } },
{ "created_time": { $lt: "2018-05-25 16:31:05" } }, { "method_id": "commerceReportPull" } ]
}
:return:
"""
_uri = XINYONGFEI_RCS_GATEWAY_URI
_dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME
collecion_name = 'fudataLog'
client = MongoClient(_uri)
db = client[_dbname]
collecion = db[collecion_name]
# 查詢 時間段 是在 '2018-05-22 16:31:05' <=create_time < '2018-05-30 16:31:05'
# 並且 method_id = commerceReportPull , status = SUCCESS 的記錄
doamin = {
"$and": [
{"created_time": {"$lt": "2018-10-30 16:31:05"}},
{"created_time": {"$gte": "2018-10-17 18:13:12"}},
{"method_id": "commerceGetOpenId"},
{"status": "SUCCESS"}
]
}
fields = {"return_data.open_id": 1,
"created_time": 1,
'method_id': 1,
"status": 1,
# 不顯示 mongo_id
'_id': 0
}
cursor = collecion.find(filter=doamin, projection=fields)
# 查看有多少記錄
print(f"cursor.count():{cursor.count()}")
# 需要迭代對象,才能取到值.
for doc in cursor:
print(doc)
if __name__ == '__main__':
test_mongo_between()
pass
結果如下:
cursor.count():14
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 18:59:27', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': '929loss67cisw42f8ocvrgonsxwkl5clryvuihlx'}}
{'created_time': '2018-10-26 17:50:39', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'p5lmxzfgprnhpuvv3pkjlt8iv6wtc9wzevzywk4x'}}
{'created_time': '2018-10-26 17:50:39', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'p5lmxzfgprnhpuvv3pkjlt8iv6wtc9wzevzywk4x'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:44:18', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'g1zu3wgncengql9dis2u3ghfnqh3ghtjlob4o2mv'}}
{'created_time': '2018-10-29 18:44:18', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'g1zu3wgncengql9dis2u3ghfnqh3ghtjlob4o2mv'}}
Process finished with exit code 0
注意 and 查詢條件 的寫法
doamin = {
"$and": [
{"created_time": {"$lt": "2018-10-30 16:31:05"}},
{"created_time": {"$gte": "2018-10-17 18:13:12"}},
{"method_id": "commerceGetOpenId"},
{"status": "SUCCESS"}
]
}
2 find 中 or 的用法
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymogo.py
@Author : [email protected]
find 基本用法 or 用法
"""
from pymongo import MongoClient
# mongo URI 連接配置
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME
def test_mongo_or():
"""
or 的使用
doamin = {
"$or": [
{"user_id": "99063974"},
{"user_id": "99063770"},
]
}
:return:
"""
_uri = XINYONGFEI_RCS_GATEWAY_URI
_dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME
collection_name = 'fudataLog'
client = MongoClient(_uri)
db = client[_dbname]
collection = db[collection_name]
doamin = {
"$or": [
{"user_id": "99063974"},
{"user_id": "99063770"},
]
}
fields = {
"user_id": 1,
"status": 1,
# 不顯示 mongo_id
'_id': 0
}
cursor = collection.find(filter=doamin, projection=fields)
# 查看有多少記錄
print(f"cursor.count():{cursor.count()}")
# 需要迭代對象,才能取到值.
for doc in cursor:
print(doc)
if __name__ == '__main__':
test_mongo_or()
pass
結果如下:
cursor.count():34
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'ERROR'}
{'user_id': '99063770', 'status': 'ERROR'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'ERROR'}
{'user_id': '99063974', 'status': 'SUCCESS'}
...
這樣就可以了,可以看出 這樣全部文檔 被查找出來了.
主要是 or 的用法, 這裏只是把 and 換成了or 其他不變.
這裏查詢 user_id 是99063974 or 99063770 的文檔 .
doamin = {
"$or": [
{"user_id": "99063974"},
{"user_id": "99063770"},
]
}
可以看出 結果已經 找出來了, 但是結果裏面 可能有status 等於error的記錄, 我們可不可以拿到全是成功 記錄呢, 肯定是可以的. 取結果集中成功的記錄. 只要在添加一個條件即可.
看下面的例子:
def test_mongo_or_and():
"""
and 和 or 的使用
doamin = {
"status": "SUCCESS",
"$or": [
{"user_id": "99063974"},
{"user_id": "99063770"},
]
}
:return:
"""
_uri = XINYONGFEI_RCS_GATEWAY_URI
_dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME
collection_name = 'fudataLog'
client = MongoClient(_uri)
db = client[_dbname]
collection = db[collection_name]
doamin = {
"status": "SUCCESS",
"$or": [
{"user_id": "99063974"},
{"user_id": "99063770"},
]
}
fields = {
"user_id": 1,
"status": 1,
# 不顯示 mongo_id
'_id': 0
}
cursor = collection.find(filter=doamin, projection=fields)
# 查看有多少記錄
print(f"cursor.count():{cursor.count()}")
# 需要迭代對象,才能取到值.
for doc in cursor:
print(doc)
結果如下:
cursor.count():6
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
mongodb 常用的一些查詢:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymongo_condition.py
@Author : [email protected]
條件查詢 :
1 範圍查詢: 按照時間範圍 查詢, 按照 user_id 查詢 某一範圍的數據.
filter_ = {
# 查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
"$and": [
{"created_time": {"$lte": '2018-12-07 15:25:43'}},
{"created_time": {"$gt": '2018-09-01 16:00:30'}},
{'user_id': "99063857"}
]
}
2 TODO
"""
from pymongo import MongoClient
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
client = MongoClient(SHOUFUYOU_REPORTING_URI)
# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
# 獲取collection
call_record = mongo_db['callRecord']
fields = {'_id': 0, 'created_time': 1, "user_id": 1}
filter_ = {
# 查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
"$and": [
{"created_time": {"$lte": '2018-12-07 15:25:43'}},
{"created_time": {"$gt": '2018-09-01 16:00:30'}},
{'user_id': "99063857"}
]
}
cursor = call_record.find(filter=filter_, projection=fields).limit(5)
print(cursor.count())
for doc in cursor:
print(doc)
3 find 中 in 操作
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymogo.py
@Author : [email protected]
find 基本用法 in
# mongodb in 查詢
mongo_id_list_test = [
# 數據 mongo_id
ObjectId("5be2b43da90ec1470078ef53"),
ObjectId("5be3ec1da90ec146d71b551f"),
ObjectId("5be422eba90ec106a54840b2")
]
filter_ = {"_id": {"$in": mongo_id_list_test}}
"""
from pymongo import MongoClient
# mongo URI 連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
# 導入這個包
from bson.objectid import ObjectId
client = MongoClient(SHOUFUYOU_REPORTING_URI)
# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
# 獲取collection
call_record = mongo_db['callRecord']
if __name__ == '__main__':
mongo_id_list_test = [
# 數據 mongo_id
ObjectId("5be2b43da90ec1470078ef53"),
ObjectId("5be3ec1da90ec146d71b551f"),
ObjectId("5be422eba90ec106a54840b2")
]
# mongodb in 查詢
filter_ = {"_id": {"$in": mongo_id_list_test}}
projection = {
'_id': 1,
'created_time': 1,
}
# cursor 注意 find 並不會返回文檔, 而是返回一個cursor 對象
documents = call_record.find(filter_, projection)
print(f"documents:{documents}")
# 需要迭代對象,才能取到值.
for doc in documents:
print(doc)
4 find 中的排序操作
pymongo.ASCENDING 升序
pymongo.DESCENDING 降序
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/3/23 21:39
@File : test_pymongo_condition.py
@Author : [email protected]
排序操作 :
pymongo.ASCENDING 升序
pymongo.DESCENDING 降序
for doc in collection.find().sort('field', pymongo.ASCENDING):
print(doc)
for doc in collection.find().sort([
('field1', pymongo.ASCENDING),
('field2', pymongo.DESCENDING)]):
print(doc)
"""
import pymongo
from pymongo import MongoClient
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
client = MongoClient(SHOUFUYOU_REPORTING_URI)
# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
# 獲取collection
call_record = mongo_db['callRecord']
def test_sort(collection):
fields = {'_id': 0, 'created_time': 1, "user_id": 1}
filter_ = {
# 查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
"$and": [
{"created_time": {"$lte": '2018-12-07 15:25:43'}},
{"created_time": {"$gt": '2018-09-01 16:00:30'}},
]
}
# 按照 create_time 降序排序
cursor = collection.find(filter=filter_, projection=fields).sort([
('created_time', pymongo.DESCENDING),
]).limit(10)
print(cursor.count())
for doc in cursor:
print(doc)
def test_sort_multi(collection):
fields = {'_id': 0, 'created_time': 1, "user_id": 1}
filter_ = {
# 查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
"$and": [
{"created_time": {"$lte": '2018-12-07 15:25:43'}},
{"created_time": {"$gt": '2018-09-01 16:00:30'}},
]
}
# 按照 create_time 降序排序
# 注意這裏的排序 是有順序的,這裏是先按照usre_id 升序,之後在按照created_time 降序排序.
cursor = collection.find(filter=filter_, projection=fields).sort([
('user_id', pymongo.ASCENDING),
('created_time', pymongo.DESCENDING),
]).limit(50)
print(cursor.count())
for doc in cursor:
print(doc)
if __name__ == '__main__':
# test_sort(call_record)
test_sort_multi(call_record)
5 find 中 的skip 和limit 操作.
有時候 我們希望可以跳過幾個文檔, 限制文檔的數量. 這個時候 就可以使用 skip 和 limit 來完成這樣的操作 ,使用起來也非常方便.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/4/3 11:56
@File : test_cursor_skip_limit .py
@Author : [email protected]
"""
from pymongo import MongoClient
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
client = MongoClient(SHOUFUYOU_REPORTING_URI, maxPoolSize=50)
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
collection = mongo_db['contacts']
domain = {"event_id": "1000073"}
fields = {'_id': 1, 'created_time': 1, 'event_id': 1}
cursor = collection.find(domain, fields)
# copy 一份 cursor 對象.
cursor_copy = cursor.clone()
values = cursor.skip(3).limit(2)
print(f"count:{values.count()}")
for item in values:
print(item)
print("--------copy cursor top 10 document------")
for idx, doc in enumerate(cursor_copy[0:10]):
print(idx, doc)
結果如下:
count:393
{'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
{'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
--------copy cursor top 10 document------
0 {'_id': ObjectId('5bc5ab05a90ec16eb23ee498'), 'event_id': '1000073', 'created_time': '2018-10-16 17:10:29'}
1 {'_id': ObjectId('5bc69975a90ec16ea023e42d'), 'event_id': '1000073', 'created_time': '2018-10-17 10:07:49'}
2 {'_id': ObjectId('5bc712afa90ec16ea20ff19f'), 'event_id': '1000073', 'created_time': '2018-10-17 18:45:03'}
3 {'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
4 {'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
5 {'_id': ObjectId('5bd27f8ea90ec1277f7e91d1'), 'event_id': '1000073', 'created_time': '2018-10-26 10:44:30'}
6 {'_id': ObjectId('5bd6de89a90ec12779579b77'), 'event_id': '1000073', 'created_time': '2018-10-29 18:18:49'}
7 {'_id': ObjectId('5bd6e416a90ec1278e0a16e8'), 'event_id': '1000073', 'created_time': '2018-10-29 18:42:30'}
8 {'_id': ObjectId('5bd81a1ea90ec127806c7670'), 'event_id': '1000073', 'created_time': '2018-10-30 16:45:18'}
9 {'_id': ObjectId('5be015d8a90ec146e7432850'), 'event_id': '1000073', 'created_time': '2018-11-05 18:05:12'}
從以上的結果可以看出來,skip 3 , limit 2 . 就是下面idx 3 ,4的值.
上面 的寫法 也可以這樣寫:
values = cursor.limit(2).skip(3)
爲什麼可以這樣寫呢? 感覺非常像鏈式編程了. 爲什麼可以這樣隨意控制呢?
其實 這裏 limit 最後 返回的 也是cursor 對象, skip 返回的也是cursor 對象. 所以 這樣 就可以 一直 .skip().limit().skip() 這種方式進行編程.
這兩個方法 返回的都是自己 的對象, 也就是對應 代碼:
看下 skip 代碼, 首先 檢查skip 類型,做了一些簡單的判斷, 之後把 skip 保存到 自己 私有變量裏面. self.__skip
def skip(self, skip):
"""Skips the first `skip` results of this cursor.
Raises :exc:`TypeError` if `skip` is not an integer. Raises
:exc:`ValueError` if `skip` is less than ``0``. Raises
:exc:`~pymongo.errors.InvalidOperation` if this :class:`Cursor` has
already been used. The last `skip` applied to this cursor takes
precedence.
:Parameters:
- `skip`: the number of results to skip
"""
if not isinstance(skip, integer_types):
raise TypeError("skip must be an integer")
if skip < 0:
raise ValueError("skip must be >= 0")
self.__check_okay_to_chain()
self.__skip = skip
return self
limit 的方法實現 其實和skip 是差不多的.
def limit(self, limit):
"""Limits the number of results to be returned by this cursor.
Raises :exc:`TypeError` if `limit` is not an integer. Raises
:exc:`~pymongo.errors.InvalidOperation` if this :class:`Cursor`
has already been used. The last `limit` applied to this cursor
takes precedence. A limit of ``0`` is equivalent to no limit.
:Parameters:
- `limit`: the number of results to return
.. mongodoc:: limit
"""
if not isinstance(limit, integer_types):
raise TypeError("limit must be an integer")
if self.__exhaust:
raise InvalidOperation("Can't use limit and exhaust together.")
self.__check_okay_to_chain()
self.__empty = False
self.__limit = limit
return self
find 的返回結果 cursor 對象
cursor 對象 可以通過collection.find() 來返回一個 cursor 對象
cursor對象 可以實現了切片協議, 因此可以使用 切片操作.
cursor.count() 方法 可以查詢 查詢了多少 文檔,返回文檔總數.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time : 2019/4/3 11:56
@File : test_cursor_getitem.py
@Author : [email protected]
"""
from pymongo import MongoClient
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME
client = MongoClient(SHOUFUYOU_REPORTING_URI, maxPoolSize=50)
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
collection = mongo_db['contacts']
domain = {"event_id": "1000073"}
fields = {'_id': 1, 'created_time': 1, 'event_id': 1}
# 切片操作.
values = collection.find(domain, fields)[2:5]
print(f"count:{values.count()}")
for item in values:
print(item)
count:393
{'_id': ObjectId('5bc712afa90ec16ea20ff19f'), 'event_id': '1000073', 'created_time': '2018-10-17 18:45:03'}
{'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
{'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
關於cursor 對象我簡單聊一下.
mongodb 讀取 數據的工具類
實現 從 mongodb 中讀取 數據, 通過配置 字段,以及篩選條件 來完成 參數的配置.
實現 read 方法 批量讀取數據.
from pymongo import MongoClient
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME
import logging
logger = logging.getLogger(__name__)
class MongoReader(BaseReader):
def __init__(self, uri, db_name, collecion_name, domain, fields):
"""
mongo reader 工具類
:param url: uri mongo 連接的URI
:param db_name: db名稱
:param collecion_name: collection_name
:param domain: 查詢條件
:param fields: 過濾字段 {"name":1,"_id":1}
"""
super().__init__(url=uri)
self._dbname = db_name
self._collecion_name = collecion_name
self.domain = domain
self.fields = fields
client = MongoClient(self.url)
db = client[self._dbname]
self.collecion = db[self._collecion_name]
# 最大讀取數量
self.max_count = 30000000000000
def read(self, start=0, step=1000):
limit = step - start
skip_number = start
count = self.collecion.count_documents(filter=self.domain)
logger.info(f"total count:{count}")
while True:
logger.info(f'limit:{limit},skip:{skip_number}, start:{skip_number-start},end:{skip_number+limit}')
# cursor = self.collecion.find(self.domain, self.fields, no_cursor_timeout=True).limit(limit).skip(
# skip_number)
cursor = self.collecion.find(self.domain, self.fields, no_cursor_timeout=True).limit(limit).skip(
skip_number)
# 查詢數據量
number = cursor.count(with_limit_and_skip=True)
if number:
yield [d for d in cursor]
skip_number += number
if number < limit:
logger.info("number:{},limit:{}. number < limit,break".format(number, limit))
# 把cursor 關掉
cursor.close()
break
if skip_number >= self.max_count:
logger.info("skip_number:{},self.max_count:{}.skip_number >= self.max_count,break".format(
skip_number,
self.max_count))
# 把cursor 關掉
cursor.close()
break
if __name__ == '__main__':
start_time = '2018-10-01 11:03:05'
end_time = '2019-01-20 14:03:49'
reader_config = {
'uri': XINYONGFEI_RCS_GATEWAY_URI,
'db_name': XINYONGFEI_RCS_GATEWAY_DB_NAME,
'domain': {"$and": [{"created_time": {"$lt": end_time}}, {"created_time": {"$gte": start_time}},
{"method_id": "securityReport"}, {"status": "SUCCESS"}]},
'fields': {"created_time": 1, "user_id": 1, "_id": 1},
'collecion_name': 'moxieSecurityLog',
}
reader = MongoReader(**reader_config)
for data in reader.read():
print(data)
print('frank')
錯誤總結:
1 CursorNotFound 錯誤, 報 cursor 沒有找到
報錯如下:
pymongo.errors.CursorNotFound: Cursor not found, cursor id: 387396591387
Exception in thread consumer_14:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/Users/frank/PycharmProjects/xinyongfei-bi-model/rcsdecisionv2.py", line 843, in run
result = self.parse(values)
File "/Users/frank/PycharmProjects/xinyongfei-bi-model/rcsdecisionv2.py", line 872, in parse
for posts in posts_list:
File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 1132, in next
if len(self.__data) or self._refresh():
File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 1075, in _refresh
self.__max_await_time_ms))
File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 947, in __send_message
helpers._check_command_response(doc['data'][0])
File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/helpers.py", line 207, in _check_command_response
raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: Cursor not found, cursor id: 387396591387
問題分析:
cursor 超時了.
設置參數 no_cursor_timeout = True
解決方案 :
demos = db['demo'].find({},{"_id": 0},no_cursor_timeout = True)
for cursor in demos:
do_something()
demo.close() # 關閉遊標
官方文檔:
官方文檔 默認 是10min , 就會關閉 cursor , 這裏 可以設置一個 永不超時的參數.
no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
參考資料:
https://www.jianshu.com/p/a8551bd17b5b
http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find
參考文檔 :
1 api cursor http://api.mongodb.com/python/current/api/pymongo/cursor.html
2 api count_documents http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.count_documents
3 api collecion.html http://api.mongodb.com/python/current/api/pymongo/collection.html
4 api 排序操作 http://api.mongodb.com/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.sort
5 mongodb tutorial http://api.mongodb.com/python/current/tutorial.html
1 Python3中PyMongo的用法 https://zhuanlan.zhihu.com/p/29435868
2 Python3 中PyMongo 的用法 https://cloud.tencent.com/developer/article/1005552
3 菜鳥用Python操作MongoDB,看這一篇就夠了 https://cloud.tencent.com/developer/article/1169645
4 PyMongo 庫使用基礎使用速成教程 https://www.jianshu.com/p/acc57241f9f0