使用pymongo來操作mongodb數據庫

本文介紹 mongodb 的基本使用,常用操作.主要講 pymongo 的使用, 同時必要的時候會說一些 源碼的 以及注意事項.

涉及主要說了一些常見的問題, monggodb 中經常用過的查詢操作.

  • and or 用法
  • 排序操作
  • 工具類
  • in 查詢
  • skip ,offset 操作
  • cursor 介紹
  • - 遇到錯誤 相關錯誤

1 根據mongo_id 查詢文檔


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

按照 object_id  查詢 document

"""

from pymongo import MongoClient
# mongo URI  連接配置 
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

# 導入這個包
from bson.objectid import ObjectId

# 通過uri 連接 pymongo 生成 client 
client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db 通過名稱 獲取db .
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']


if __name__ == '__main__':
    # 查詢條件, 相當於 where 後面跟的條件
    # filter_ = {'_id': ObjectId('5be2b43da90ec1470078ef53')}
    filter_ = {'_id': ObjectId('5be2b43da90ec1470078ef50')}

    # 過濾字段, 需要篩選出來 你想要的字段, 相當於 select 後面跟的 字段,
    #  格式 '字段名':1 顯示, '字段名':0 不顯示. 默認 是都顯示出來, 如果指定了字段 則根據指定條件 新疆顯示.
    projection = {'source_type': 1, '_id': 1}

    # 根據mongo_id  查詢數據, 如果沒有返回 None
    document = call_record.find_one(filter=filter_, projection=projection)

    print(document)
    #結果  {'_id': ObjectId('5be2b43da90ec1470078ef53'), 'source_type': 'android'}

通過 URI 連接 到mongodb, 之後獲取db, 最後 獲取collection 就可以了. 之後 就可以 find取查詢 數據庫了.

get_database參考這個文檔 http://api.mongodb.com/python/current/tutorial.html#getting-a-database

注意這裏用的 是 find_one 這個方法 只是用來查詢確定一條文檔,纔會使用. 一般 情況下 會使用 find 這個命令 會多一些.

舉個簡單的例子吧 .

cursor.find 的用法

find 的使用,在 mongodb 查詢 find 用的是最多的.

find 返回 結果是一個cursor , 如果沒有結果就會 None .

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 in

"""

from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

# 導入這個包
from bson.objectid import ObjectId

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']

if __name__ == '__main__':

    mongo_id_list_test = [
        # 數據 mongo_id
        ObjectId("5be2b43da90ec1470078ef53"),
        ObjectId("5be3ec1da90ec146d71b551f"),
        ObjectId("5be422eba90ec106a54840b2")

    ]

    # mongodb  in 查詢, 查詢條件
    filter_ = {"_id": {"$in": mongo_id_list_test}}
    
    # 篩選字段 
    projection = {
        '_id': 1,
        'created_time': 1,
    }

    # cursor  注意 find 並不會返回文檔, 而是返回一個cursor 對象
    documents = call_record.find(filter_, projection)

    print(f"documents:{documents}")

    # 需要迭代對象,才能取到值.
    for doc in documents:
        print(doc)

結果如下:

documents:<pymongo.cursor.Cursor object at 0x10a60d9e8>
{'_id': ObjectId('5be2b43da90ec1470078ef53'), 'created_time': '2018-11-07 17:45:33'}
{'_id': ObjectId('5be3ec1da90ec146d71b551f'), 'created_time': '2018-11-08 15:56:13'}
{'_id': ObjectId('5be422eba90ec106a54840b2'), 'created_time': '2018-11-08 19:50:03'}

來說下 find 的參數. find 參數 還是挺多的.
這裏只說幾個比較重要的.

  • filter 第一個位置 參數 就是篩選條件
  • projection 第二個 位置參數 篩選字段
  • no_cursor_timeout 判斷cursor 是否超時,默認是False ,永不超時
1 find 中and 的用法

and 語法 如下 :

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 and  用法 



use xinyongfei_rcs_gateway;

db.getCollection("fudataLog").find(
    {
        "$and" : [
            {
                "status" : "SUCCESS"
            },
            {
                "created_time" : {
                    "$gte" : "2018-05-22 17:18:45"
                }
            },
            {
                "created_time" : {
                    "$lt" : "2018-05-29 17:18:45"
                }
            },
            {
                "status" : "SUCCESS"
            }
        ]
    }
);




"""
from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME


def test_mongo_between():
    """
    { $and: [ { "created_time": { $gte: "2018-05-22 16:31:05" } },
        { "created_time": { $lt: "2018-05-25 16:31:05" } }, { "method_id": "commerceReportPull" } ]
    }
    :return:
    """

    _uri = XINYONGFEI_RCS_GATEWAY_URI
    _dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME

    collecion_name = 'fudataLog'

    client = MongoClient(_uri)
    db = client[_dbname]
    collecion = db[collecion_name]

    # 查詢 時間段 是在   '2018-05-22 16:31:05' <=create_time < '2018-05-30 16:31:05'
    # 並且  method_id = commerceReportPull , status = SUCCESS 的記錄
    doamin = {
        "$and": [
            {"created_time": {"$lt": "2018-10-30 16:31:05"}},
            {"created_time": {"$gte": "2018-10-17 18:13:12"}},
            {"method_id": "commerceGetOpenId"},
            {"status": "SUCCESS"}
        ]
    }

    fields = {"return_data.open_id": 1,
              "created_time": 1,
              'method_id': 1,
              "status": 1,
              # 不顯示 mongo_id
              '_id': 0
              }

    cursor = collecion.find(filter=doamin, projection=fields)

    # 查看有多少記錄
    print(f"cursor.count():{cursor.count()}")

    # 需要迭代對象,才能取到值.
    for doc in cursor:
        print(doc)


if __name__ == '__main__':
    test_mongo_between()
    pass


結果如下:

cursor.count():14
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 18:59:27', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': '929loss67cisw42f8ocvrgonsxwkl5clryvuihlx'}}
{'created_time': '2018-10-26 17:50:39', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'p5lmxzfgprnhpuvv3pkjlt8iv6wtc9wzevzywk4x'}}
{'created_time': '2018-10-26 17:50:39', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'p5lmxzfgprnhpuvv3pkjlt8iv6wtc9wzevzywk4x'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:44:18', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'g1zu3wgncengql9dis2u3ghfnqh3ghtjlob4o2mv'}}
{'created_time': '2018-10-29 18:44:18', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'g1zu3wgncengql9dis2u3ghfnqh3ghtjlob4o2mv'}}

Process finished with exit code 0

注意 and 查詢條件 的寫法

 doamin = {
        "$and": [
            {"created_time": {"$lt": "2018-10-30 16:31:05"}},
            {"created_time": {"$gte": "2018-10-17 18:13:12"}},
            {"method_id": "commerceGetOpenId"},
            {"status": "SUCCESS"}
        ]
    }
2 find 中 or 的用法
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 or   用法


"""
from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME


def test_mongo_or():
    """
    or 的使用
    doamin = {
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }


    :return:
    """

    _uri = XINYONGFEI_RCS_GATEWAY_URI
    _dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME

    collection_name = 'fudataLog'

    client = MongoClient(_uri)
    db = client[_dbname]
    collection = db[collection_name]

    doamin = {
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

    fields = {
        "user_id": 1,
        "status": 1,
        # 不顯示 mongo_id
        '_id': 0
    }

    cursor = collection.find(filter=doamin, projection=fields)

    # 查看有多少記錄
    print(f"cursor.count():{cursor.count()}")

    # 需要迭代對象,才能取到值.
    for doc in cursor:
        print(doc)


if __name__ == '__main__':
    test_mongo_or()
    pass

結果如下:

cursor.count():34
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'ERROR'}
{'user_id': '99063770', 'status': 'ERROR'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'ERROR'}
{'user_id': '99063974', 'status': 'SUCCESS'}
...

這樣就可以了,可以看出 這樣全部文檔 被查找出來了.

主要是 or 的用法, 這裏只是把 and 換成了or 其他不變.
這裏查詢 user_id 是99063974 or 99063770 的文檔 .

    doamin = {
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

可以看出 結果已經 找出來了, 但是結果裏面 可能有status 等於error的記錄, 我們可不可以拿到全是成功 記錄呢, 肯定是可以的. 取結果集中成功的記錄. 只要在添加一個條件即可.
看下面的例子:

def test_mongo_or_and():
    """
    and 和 or 的使用
    doamin = {

        "status": "SUCCESS",
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

    :return:
    """

    _uri = XINYONGFEI_RCS_GATEWAY_URI
    _dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME
    collection_name = 'fudataLog'

    client = MongoClient(_uri)
    db = client[_dbname]
    collection = db[collection_name]

    doamin = {

        "status": "SUCCESS",
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

    fields = {
        "user_id": 1,
        "status": 1,
        # 不顯示 mongo_id
        '_id': 0
    }

    cursor = collection.find(filter=doamin, projection=fields)

    # 查看有多少記錄
    print(f"cursor.count():{cursor.count()}")

    # 需要迭代對象,才能取到值.
    for doc in cursor:
        print(doc)

結果如下:

cursor.count():6
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}

mongodb 常用的一些查詢:


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymongo_condition.py
@Author  : [email protected]


條件查詢 :



1 範圍查詢: 按照時間範圍 查詢, 按照 user_id  查詢 某一範圍的數據.

filter_ = {
    #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
    "$and": [
        {"created_time": {"$lte": '2018-12-07 15:25:43'}},
        {"created_time": {"$gt": '2018-09-01 16:00:30'}},
        {'user_id': "99063857"}

    ]

}


2 TODO 



"""

from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']

fields = {'_id': 0, 'created_time': 1, "user_id": 1}
filter_ = {
    #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
    "$and": [
        {"created_time": {"$lte": '2018-12-07 15:25:43'}},
        {"created_time": {"$gt": '2018-09-01 16:00:30'}},
        {'user_id': "99063857"}

    ]

}

cursor = call_record.find(filter=filter_, projection=fields).limit(5)

print(cursor.count())

for doc in cursor:
    print(doc)
3 find 中 in 操作
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 in

# mongodb  in 查詢
mongo_id_list_test = [
    # 數據 mongo_id
    ObjectId("5be2b43da90ec1470078ef53"),
    ObjectId("5be3ec1da90ec146d71b551f"),
    ObjectId("5be422eba90ec106a54840b2")

]
filter_ = {"_id": {"$in": mongo_id_list_test}}
"""

from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

# 導入這個包
from bson.objectid import ObjectId

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']

if __name__ == '__main__':

    mongo_id_list_test = [
        # 數據 mongo_id
        ObjectId("5be2b43da90ec1470078ef53"),
        ObjectId("5be3ec1da90ec146d71b551f"),
        ObjectId("5be422eba90ec106a54840b2")

    ]
    # mongodb  in 查詢
    filter_ = {"_id": {"$in": mongo_id_list_test}}

    projection = {
        '_id': 1,
        'created_time': 1,
    }

    # cursor  注意 find 並不會返回文檔, 而是返回一個cursor 對象
    documents = call_record.find(filter_, projection)

    print(f"documents:{documents}")

    # 需要迭代對象,才能取到值.
    for doc in documents:
        print(doc)
4 find 中的排序操作

pymongo.ASCENDING 升序
pymongo.DESCENDING 降序

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymongo_condition.py
@Author  : [email protected]


排序操作 :

pymongo.ASCENDING  升序
pymongo.DESCENDING  降序


for doc in collection.find().sort('field', pymongo.ASCENDING):
    print(doc)




for doc in collection.find().sort([
        ('field1', pymongo.ASCENDING),
        ('field2', pymongo.DESCENDING)]):
    print(doc)



"""
import pymongo
from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']


def test_sort(collection):
    fields = {'_id': 0, 'created_time': 1, "user_id": 1}
    filter_ = {
        #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
        "$and": [
            {"created_time": {"$lte": '2018-12-07 15:25:43'}},
            {"created_time": {"$gt": '2018-09-01 16:00:30'}},

        ]

    }

    # 按照 create_time 降序排序
    cursor = collection.find(filter=filter_, projection=fields).sort([
        ('created_time', pymongo.DESCENDING),

    ]).limit(10)

    print(cursor.count())

    for doc in cursor:
        print(doc)


def test_sort_multi(collection):
    fields = {'_id': 0, 'created_time': 1, "user_id": 1}
    filter_ = {
        #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
        "$and": [
            {"created_time": {"$lte": '2018-12-07 15:25:43'}},
            {"created_time": {"$gt": '2018-09-01 16:00:30'}},

        ]

    }

    # 按照 create_time 降序排序
    # 注意這裏的排序 是有順序的,這裏是先按照usre_id 升序,之後在按照created_time 降序排序.
    cursor = collection.find(filter=filter_, projection=fields).sort([
        ('user_id', pymongo.ASCENDING),
        ('created_time', pymongo.DESCENDING),

    ]).limit(50)

    print(cursor.count())

    for doc in cursor:
        print(doc)


if __name__ == '__main__':
    # test_sort(call_record)
    test_sort_multi(call_record)
5 find 中 的skip 和limit 操作.

有時候 我們希望可以跳過幾個文檔, 限制文檔的數量. 這個時候 就可以使用 skip 和 limit 來完成這樣的操作 ,使用起來也非常方便.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/4/3 11:56
@File    : test_cursor_skip_limit .py
@Author  : [email protected]
"""

from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI, maxPoolSize=50)
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
collection = mongo_db['contacts']

domain = {"event_id": "1000073"}
fields = {'_id': 1, 'created_time': 1, 'event_id': 1}


cursor = collection.find(domain, fields)


# copy 一份 cursor 對象.
cursor_copy = cursor.clone()

values = cursor.skip(3).limit(2)
print(f"count:{values.count()}")
for item in values:
    print(item)

print("--------copy cursor  top 10 document------")

for idx, doc in enumerate(cursor_copy[0:10]):
    print(idx, doc)

結果如下:

count:393
{'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
{'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
--------copy cursor  top 10 document------
0 {'_id': ObjectId('5bc5ab05a90ec16eb23ee498'), 'event_id': '1000073', 'created_time': '2018-10-16 17:10:29'}
1 {'_id': ObjectId('5bc69975a90ec16ea023e42d'), 'event_id': '1000073', 'created_time': '2018-10-17 10:07:49'}
2 {'_id': ObjectId('5bc712afa90ec16ea20ff19f'), 'event_id': '1000073', 'created_time': '2018-10-17 18:45:03'}
3 {'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
4 {'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
5 {'_id': ObjectId('5bd27f8ea90ec1277f7e91d1'), 'event_id': '1000073', 'created_time': '2018-10-26 10:44:30'}
6 {'_id': ObjectId('5bd6de89a90ec12779579b77'), 'event_id': '1000073', 'created_time': '2018-10-29 18:18:49'}
7 {'_id': ObjectId('5bd6e416a90ec1278e0a16e8'), 'event_id': '1000073', 'created_time': '2018-10-29 18:42:30'}
8 {'_id': ObjectId('5bd81a1ea90ec127806c7670'), 'event_id': '1000073', 'created_time': '2018-10-30 16:45:18'}
9 {'_id': ObjectId('5be015d8a90ec146e7432850'), 'event_id': '1000073', 'created_time': '2018-11-05 18:05:12'}


從以上的結果可以看出來,skip 3 , limit 2 . 就是下面idx 3 ,4的值.

上面 的寫法 也可以這樣寫:

values = cursor.limit(2).skip(3)

爲什麼可以這樣寫呢? 感覺非常像鏈式編程了. 爲什麼可以這樣隨意控制呢?
其實 這裏 limit 最後 返回的 也是cursor 對象, skip 返回的也是cursor 對象. 所以 這樣 就可以 一直 .skip().limit().skip() 這種方式進行編程.

這兩個方法 返回的都是自己 的對象, 也就是對應 代碼:

看下 skip 代碼, 首先 檢查skip 類型,做了一些簡單的判斷, 之後把 skip 保存到 自己 私有變量裏面. self.__skip

   def skip(self, skip):
        """Skips the first `skip` results of this cursor.

        Raises :exc:`TypeError` if `skip` is not an integer. Raises
        :exc:`ValueError` if `skip` is less than ``0``. Raises
        :exc:`~pymongo.errors.InvalidOperation` if this :class:`Cursor` has
        already been used. The last `skip` applied to this cursor takes
        precedence.

        :Parameters:
          - `skip`: the number of results to skip
        """
        if not isinstance(skip, integer_types):
            raise TypeError("skip must be an integer")
        if skip < 0:
            raise ValueError("skip must be >= 0")
        self.__check_okay_to_chain()

        self.__skip = skip
        return self
        

limit 的方法實現 其實和skip 是差不多的.

    def limit(self, limit):
        """Limits the number of results to be returned by this cursor.

        Raises :exc:`TypeError` if `limit` is not an integer. Raises
        :exc:`~pymongo.errors.InvalidOperation` if this :class:`Cursor`
        has already been used. The last `limit` applied to this cursor
        takes precedence. A limit of ``0`` is equivalent to no limit.

        :Parameters:
          - `limit`: the number of results to return

        .. mongodoc:: limit
        """
        if not isinstance(limit, integer_types):
            raise TypeError("limit must be an integer")
        if self.__exhaust:
            raise InvalidOperation("Can't use limit and exhaust together.")
        self.__check_okay_to_chain()

        self.__empty = False
        self.__limit = limit
        return self
        
find 的返回結果 cursor 對象

cursor 對象 可以通過collection.find() 來返回一個 cursor 對象
cursor對象 可以實現了切片協議, 因此可以使用 切片操作.

cursor.count() 方法 可以查詢 查詢了多少 文檔,返回文檔總數.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/4/3 11:56
@File    : test_cursor_getitem.py
@Author  : [email protected]
"""

from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI, maxPoolSize=50)
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
collection = mongo_db['contacts']

domain = {"event_id": "1000073"}
fields = {'_id': 1, 'created_time': 1, 'event_id': 1}

# 切片操作.
values = collection.find(domain, fields)[2:5]

print(f"count:{values.count()}")

for item in values:
    print(item)

count:393
{'_id': ObjectId('5bc712afa90ec16ea20ff19f'), 'event_id': '1000073', 'created_time': '2018-10-17 18:45:03'}
{'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
{'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
關於cursor 對象我簡單聊一下.

img1
image1

image2
image2

image3
image3

image4
image4

image5
image5

image6
image6

image7
image7

mongodb 讀取 數據的工具類

實現 從 mongodb 中讀取 數據, 通過配置 字段,以及篩選條件 來完成 參數的配置.

實現 read 方法 批量讀取數據.

from pymongo import MongoClient
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME
import logging

logger = logging.getLogger(__name__)



class MongoReader(BaseReader):
    def __init__(self, uri, db_name, collecion_name, domain, fields):
        """
        mongo reader    工具類
        :param url:  uri mongo 連接的URI
        :param db_name:  db名稱
        :param collecion_name:  collection_name
        :param domain:   查詢條件
        :param fields:  過濾字段  {"name":1,"_id":1}
        """
        super().__init__(url=uri)

        self._dbname = db_name
        self._collecion_name = collecion_name

        self.domain = domain
        self.fields = fields

        client = MongoClient(self.url)
        db = client[self._dbname]
        self.collecion = db[self._collecion_name]

        # 最大讀取數量
        self.max_count = 30000000000000


    def read(self, start=0, step=1000):

        limit = step - start
        skip_number = start

        count = self.collecion.count_documents(filter=self.domain)
        logger.info(f"total count:{count}")
        while True:
            logger.info(f'limit:{limit},skip:{skip_number}, start:{skip_number-start},end:{skip_number+limit}')
            # cursor = self.collecion.find(self.domain, self.fields, no_cursor_timeout=True).limit(limit).skip(
            #     skip_number)

            cursor = self.collecion.find(self.domain, self.fields, no_cursor_timeout=True).limit(limit).skip(
                skip_number)

            # 查詢數據量
            number = cursor.count(with_limit_and_skip=True)
            if number:
                yield [d for d in cursor]

            skip_number += number
            if number < limit:
                logger.info("number:{},limit:{}. number < limit,break".format(number, limit))
                # 把cursor 關掉
                cursor.close()
                break

            if skip_number >= self.max_count:
                logger.info("skip_number:{},self.max_count:{}.skip_number >= self.max_count,break".format(
                    skip_number,
                    self.max_count))
                # 把cursor 關掉
                cursor.close()
                break



if __name__ == '__main__':
    start_time = '2018-10-01 11:03:05'
    end_time = '2019-01-20 14:03:49'

    reader_config = {
        'uri': XINYONGFEI_RCS_GATEWAY_URI,
        'db_name': XINYONGFEI_RCS_GATEWAY_DB_NAME,
        'domain': {"$and": [{"created_time": {"$lt": end_time}}, {"created_time": {"$gte": start_time}},
                            {"method_id": "securityReport"}, {"status": "SUCCESS"}]},
        'fields': {"created_time": 1, "user_id": 1, "_id": 1},
        'collecion_name': 'moxieSecurityLog',
    }

    reader = MongoReader(**reader_config)

    for data in reader.read():
        print(data)
    print('frank')
    

錯誤總結:
1 CursorNotFound 錯誤, 報 cursor 沒有找到

報錯如下:
pymongo.errors.CursorNotFound: Cursor not found, cursor id: 387396591387

Exception in thread consumer_14:
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/rcsdecisionv2.py", line 843, in run
    result = self.parse(values)
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/rcsdecisionv2.py", line 872, in parse
    for posts in posts_list:
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 1132, in next
    if len(self.__data) or self._refresh():
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 1075, in _refresh
    self.__max_await_time_ms))
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 947, in __send_message
    helpers._check_command_response(doc['data'][0])
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/helpers.py", line 207, in _check_command_response
    raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: Cursor not found, cursor id: 387396591387

問題分析:
cursor 超時了.

設置參數 no_cursor_timeout = True



解決方案 :
demos = db['demo'].find({},{"_id": 0},no_cursor_timeout = True)
for cursor in demos:
        do_something()
demo.close() # 關閉遊標

官方文檔:
官方文檔 默認 是10min , 就會關閉 cursor , 這裏 可以設置一個 永不超時的參數.

no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.

參考資料:

https://www.jianshu.com/p/a8551bd17b5b

http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find

參考文檔 :

1 api cursor http://api.mongodb.com/python/current/api/pymongo/cursor.html
2 api count_documents http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.count_documents
3 api collecion.html http://api.mongodb.com/python/current/api/pymongo/collection.html
4 api 排序操作 http://api.mongodb.com/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.sort
5 mongodb tutorial http://api.mongodb.com/python/current/tutorial.html

1 Python3中PyMongo的用法 https://zhuanlan.zhihu.com/p/29435868
2 Python3 中PyMongo 的用法 https://cloud.tencent.com/developer/article/1005552
3 菜鳥用Python操作MongoDB,看這一篇就夠了 https://cloud.tencent.com/developer/article/1169645
4 PyMongo 庫使用基礎使用速成教程 https://www.jianshu.com/p/acc57241f9f0

分享快樂,留住感動. 2019-04-03 21:12:05 --frank
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章