使用pymongo來操作mongodb數據庫

本文介紹 mongodb 的基本使用,常用操作.主要講 pymongo 的使用, 同時必要的時候會說一些源碼的以及注意事項.

涉及主要說了一些常見的問題, monggodb 中經常用過的查詢操作.

and or 用法
排序操作
工具類
in 查詢
skip ,offset 操作
cursor 介紹
- 遇到錯誤相關錯誤

1 根據mongo_id 查詢文檔


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

按照 object_id  查詢 document

"""

from pymongo import MongoClient
# mongo URI  連接配置 
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

# 導入這個包
from bson.objectid import ObjectId

# 通過uri 連接 pymongo 生成 client 
client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db 通過名稱 獲取db .
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']


if __name__ == '__main__':
    # 查詢條件, 相當於 where 後面跟的條件
    # filter_ = {'_id': ObjectId('5be2b43da90ec1470078ef53')}
    filter_ = {'_id': ObjectId('5be2b43da90ec1470078ef50')}

    # 過濾字段, 需要篩選出來 你想要的字段, 相當於 select 後面跟的 字段,
    #  格式 '字段名':1 顯示, '字段名':0 不顯示. 默認 是都顯示出來, 如果指定了字段 則根據指定條件 新疆顯示.
    projection = {'source_type': 1, '_id': 1}

    # 根據mongo_id  查詢數據, 如果沒有返回 None
    document = call_record.find_one(filter=filter_, projection=projection)

    print(document)
    #結果  {'_id': ObjectId('5be2b43da90ec1470078ef53'), 'source_type': 'android'}

通過 URI 連接到mongodb, 之後獲取db, 最後獲取collection 就可以了. 之後就可以 find取查詢數據庫了.

get_database參考這個文檔 http://api.mongodb.com/python/current/tutorial.html#getting-a-database

注意這裏用的是 find_one 這個方法只是用來查詢確定一條文檔,纔會使用. 一般情況下會使用 find 這個命令會多一些.

舉個簡單的例子吧 .

cursor.find 的用法

find 的使用,在 mongodb 查詢 find 用的是最多的.

find 返回結果是一個cursor , 如果沒有結果就會 None .

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 in

"""

from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

# 導入這個包
from bson.objectid import ObjectId

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']

if __name__ == '__main__':

    mongo_id_list_test = [
        # 數據 mongo_id
        ObjectId("5be2b43da90ec1470078ef53"),
        ObjectId("5be3ec1da90ec146d71b551f"),
        ObjectId("5be422eba90ec106a54840b2")

    ]

    # mongodb  in 查詢, 查詢條件
    filter_ = {"_id": {"$in": mongo_id_list_test}}
    
    # 篩選字段 
    projection = {
        '_id': 1,
        'created_time': 1,
    }

    # cursor  注意 find 並不會返回文檔, 而是返回一個cursor 對象
    documents = call_record.find(filter_, projection)

    print(f"documents:{documents}")

    # 需要迭代對象,才能取到值.
    for doc in documents:
        print(doc)

結果如下:

documents:<pymongo.cursor.Cursor object at 0x10a60d9e8>
{'_id': ObjectId('5be2b43da90ec1470078ef53'), 'created_time': '2018-11-07 17:45:33'}
{'_id': ObjectId('5be3ec1da90ec146d71b551f'), 'created_time': '2018-11-08 15:56:13'}
{'_id': ObjectId('5be422eba90ec106a54840b2'), 'created_time': '2018-11-08 19:50:03'}

來說下 find 的參數. find 參數還是挺多的.
這裏只說幾個比較重要的.

filter 第一個位置參數就是篩選條件
projection 第二個位置參數篩選字段
no_cursor_timeout 判斷cursor 是否超時,默認是False ,永不超時

1 find 中and 的用法

and 語法如下 :

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 and  用法 



use xinyongfei_rcs_gateway;

db.getCollection("fudataLog").find(
    {
        "$and" : [
            {
                "status" : "SUCCESS"
            },
            {
                "created_time" : {
                    "$gte" : "2018-05-22 17:18:45"
                }
            },
            {
                "created_time" : {
                    "$lt" : "2018-05-29 17:18:45"
                }
            },
            {
                "status" : "SUCCESS"
            }
        ]
    }
);




"""
from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME


def test_mongo_between():
    """
    { $and: [ { "created_time": { $gte: "2018-05-22 16:31:05" } },
        { "created_time": { $lt: "2018-05-25 16:31:05" } }, { "method_id": "commerceReportPull" } ]
    }
    :return:
    """

    _uri = XINYONGFEI_RCS_GATEWAY_URI
    _dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME

    collecion_name = 'fudataLog'

    client = MongoClient(_uri)
    db = client[_dbname]
    collecion = db[collecion_name]

    # 查詢 時間段 是在   '2018-05-22 16:31:05' <=create_time < '2018-05-30 16:31:05'
    # 並且  method_id = commerceReportPull , status = SUCCESS 的記錄
    doamin = {
        "$and": [
            {"created_time": {"$lt": "2018-10-30 16:31:05"}},
            {"created_time": {"$gte": "2018-10-17 18:13:12"}},
            {"method_id": "commerceGetOpenId"},
            {"status": "SUCCESS"}
        ]
    }

    fields = {"return_data.open_id": 1,
              "created_time": 1,
              'method_id': 1,
              "status": 1,
              # 不顯示 mongo_id
              '_id': 0
              }

    cursor = collecion.find(filter=doamin, projection=fields)

    # 查看有多少記錄
    print(f"cursor.count():{cursor.count()}")

    # 需要迭代對象,才能取到值.
    for doc in cursor:
        print(doc)


if __name__ == '__main__':
    test_mongo_between()
    pass

結果如下:

cursor.count():14
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-17 18:13:12', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'wdw66blejdcb3qhppc0kyo1yqhb6th3vlod0tgl9'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 14:18:42', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'v0fahoarvxwixtu64yxtdhxaxa0x0azhlrt0bhxd'}}
{'created_time': '2018-10-18 18:59:27', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': '929loss67cisw42f8ocvrgonsxwkl5clryvuihlx'}}
{'created_time': '2018-10-26 17:50:39', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'p5lmxzfgprnhpuvv3pkjlt8iv6wtc9wzevzywk4x'}}
{'created_time': '2018-10-26 17:50:39', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'p5lmxzfgprnhpuvv3pkjlt8iv6wtc9wzevzywk4x'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:20:48', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'ijhyxce9he3dgsoadt9z377cxcqqwdto3abgiz4w'}}
{'created_time': '2018-10-29 18:44:18', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'g1zu3wgncengql9dis2u3ghfnqh3ghtjlob4o2mv'}}
{'created_time': '2018-10-29 18:44:18', 'method_id': 'commerceGetOpenId', 'status': 'SUCCESS', 'return_data': {'open_id': 'g1zu3wgncengql9dis2u3ghfnqh3ghtjlob4o2mv'}}

Process finished with exit code 0

注意 and 查詢條件的寫法

 doamin = {
        "$and": [
            {"created_time": {"$lt": "2018-10-30 16:31:05"}},
            {"created_time": {"$gte": "2018-10-17 18:13:12"}},
            {"method_id": "commerceGetOpenId"},
            {"status": "SUCCESS"}
        ]
    }

2 find 中 or 的用法

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 or   用法


"""
from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME


def test_mongo_or():
    """
    or 的使用
    doamin = {
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }


    :return:
    """

    _uri = XINYONGFEI_RCS_GATEWAY_URI
    _dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME

    collection_name = 'fudataLog'

    client = MongoClient(_uri)
    db = client[_dbname]
    collection = db[collection_name]

    doamin = {
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

    fields = {
        "user_id": 1,
        "status": 1,
        # 不顯示 mongo_id
        '_id': 0
    }

    cursor = collection.find(filter=doamin, projection=fields)

    # 查看有多少記錄
    print(f"cursor.count():{cursor.count()}")

    # 需要迭代對象,才能取到值.
    for doc in cursor:
        print(doc)


if __name__ == '__main__':
    test_mongo_or()
    pass

結果如下:

cursor.count():34
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'ERROR'}
{'user_id': '99063770', 'status': 'ERROR'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'ERROR'}
{'user_id': '99063974', 'status': 'SUCCESS'}
...

這樣就可以了,可以看出這樣全部文檔被查找出來了.

主要是 or 的用法, 這裏只是把 and 換成了or 其他不變.
這裏查詢 user_id 是99063974 or 99063770 的文檔 .

    doamin = {
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

可以看出結果已經找出來了, 但是結果裏面可能有status 等於error的記錄, 我們可不可以拿到全是成功記錄呢, 肯定是可以的. 取結果集中成功的記錄. 只要在添加一個條件即可.
看下面的例子:

def test_mongo_or_and():
    """
    and 和 or 的使用
    doamin = {

        "status": "SUCCESS",
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

    :return:
    """

    _uri = XINYONGFEI_RCS_GATEWAY_URI
    _dbname = XINYONGFEI_RCS_GATEWAY_DB_NAME
    collection_name = 'fudataLog'

    client = MongoClient(_uri)
    db = client[_dbname]
    collection = db[collection_name]

    doamin = {

        "status": "SUCCESS",
        "$or": [
            {"user_id": "99063974"},
            {"user_id": "99063770"},
        ]
    }

    fields = {
        "user_id": 1,
        "status": 1,
        # 不顯示 mongo_id
        '_id': 0
    }

    cursor = collection.find(filter=doamin, projection=fields)

    # 查看有多少記錄
    print(f"cursor.count():{cursor.count()}")

    # 需要迭代對象,才能取到值.
    for doc in cursor:
        print(doc)

結果如下:

cursor.count():6
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063770', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}
{'user_id': '99063974', 'status': 'SUCCESS'}

mongodb 常用的一些查詢:


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymongo_condition.py
@Author  : [email protected]


條件查詢 :



1 範圍查詢: 按照時間範圍 查詢, 按照 user_id  查詢 某一範圍的數據.

filter_ = {
    #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
    "$and": [
        {"created_time": {"$lte": '2018-12-07 15:25:43'}},
        {"created_time": {"$gt": '2018-09-01 16:00:30'}},
        {'user_id': "99063857"}

    ]

}


2 TODO 



"""

from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']

fields = {'_id': 0, 'created_time': 1, "user_id": 1}
filter_ = {
    #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
    "$and": [
        {"created_time": {"$lte": '2018-12-07 15:25:43'}},
        {"created_time": {"$gt": '2018-09-01 16:00:30'}},
        {'user_id': "99063857"}

    ]

}

cursor = call_record.find(filter=filter_, projection=fields).limit(5)

print(cursor.count())

for doc in cursor:
    print(doc)

3 find 中 in 操作

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymogo.py
@Author  : [email protected]

find  基本用法 in

# mongodb  in 查詢
mongo_id_list_test = [
    # 數據 mongo_id
    ObjectId("5be2b43da90ec1470078ef53"),
    ObjectId("5be3ec1da90ec146d71b551f"),
    ObjectId("5be422eba90ec106a54840b2")

]
filter_ = {"_id": {"$in": mongo_id_list_test}}
"""

from pymongo import MongoClient
# mongo URI  連接配置
from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

# 導入這個包
from bson.objectid import ObjectId

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']

if __name__ == '__main__':

    mongo_id_list_test = [
        # 數據 mongo_id
        ObjectId("5be2b43da90ec1470078ef53"),
        ObjectId("5be3ec1da90ec146d71b551f"),
        ObjectId("5be422eba90ec106a54840b2")

    ]
    # mongodb  in 查詢
    filter_ = {"_id": {"$in": mongo_id_list_test}}

    projection = {
        '_id': 1,
        'created_time': 1,
    }

    # cursor  注意 find 並不會返回文檔, 而是返回一個cursor 對象
    documents = call_record.find(filter_, projection)

    print(f"documents:{documents}")

    # 需要迭代對象,才能取到值.
    for doc in documents:
        print(doc)

4 find 中的排序操作

pymongo.ASCENDING 升序
pymongo.DESCENDING 降序

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/3/23 21:39
@File    : test_pymongo_condition.py
@Author  : [email protected]


排序操作 :

pymongo.ASCENDING  升序
pymongo.DESCENDING  降序


for doc in collection.find().sort('field', pymongo.ASCENDING):
    print(doc)




for doc in collection.find().sort([
        ('field1', pymongo.ASCENDING),
        ('field2', pymongo.DESCENDING)]):
    print(doc)



"""
import pymongo
from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI)

# 獲取 db
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]

# 獲取collection
call_record = mongo_db['callRecord']


def test_sort(collection):
    fields = {'_id': 0, 'created_time': 1, "user_id": 1}
    filter_ = {
        #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
        "$and": [
            {"created_time": {"$lte": '2018-12-07 15:25:43'}},
            {"created_time": {"$gt": '2018-09-01 16:00:30'}},

        ]

    }

    # 按照 create_time 降序排序
    cursor = collection.find(filter=filter_, projection=fields).sort([
        ('created_time', pymongo.DESCENDING),

    ]).limit(10)

    print(cursor.count())

    for doc in cursor:
        print(doc)


def test_sort_multi(collection):
    fields = {'_id': 0, 'created_time': 1, "user_id": 1}
    filter_ = {
        #  查詢時間範圍,並且 user_id='99063857' 並且時間返回爲 下面之間的數據
        "$and": [
            {"created_time": {"$lte": '2018-12-07 15:25:43'}},
            {"created_time": {"$gt": '2018-09-01 16:00:30'}},

        ]

    }

    # 按照 create_time 降序排序
    # 注意這裏的排序 是有順序的,這裏是先按照usre_id 升序,之後在按照created_time 降序排序.
    cursor = collection.find(filter=filter_, projection=fields).sort([
        ('user_id', pymongo.ASCENDING),
        ('created_time', pymongo.DESCENDING),

    ]).limit(50)

    print(cursor.count())

    for doc in cursor:
        print(doc)


if __name__ == '__main__':
    # test_sort(call_record)
    test_sort_multi(call_record)

5 find 中的skip 和limit 操作.

有時候我們希望可以跳過幾個文檔, 限制文檔的數量. 這個時候就可以使用 skip 和 limit 來完成這樣的操作 ,使用起來也非常方便.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/4/3 11:56
@File    : test_cursor_skip_limit .py
@Author  : [email protected]
"""

from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI, maxPoolSize=50)
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
collection = mongo_db['contacts']

domain = {"event_id": "1000073"}
fields = {'_id': 1, 'created_time': 1, 'event_id': 1}


cursor = collection.find(domain, fields)


# copy 一份 cursor 對象.
cursor_copy = cursor.clone()

values = cursor.skip(3).limit(2)
print(f"count:{values.count()}")
for item in values:
    print(item)

print("--------copy cursor  top 10 document------")

for idx, doc in enumerate(cursor_copy[0:10]):
    print(idx, doc)

結果如下:

count:393
{'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
{'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
--------copy cursor  top 10 document------
0 {'_id': ObjectId('5bc5ab05a90ec16eb23ee498'), 'event_id': '1000073', 'created_time': '2018-10-16 17:10:29'}
1 {'_id': ObjectId('5bc69975a90ec16ea023e42d'), 'event_id': '1000073', 'created_time': '2018-10-17 10:07:49'}
2 {'_id': ObjectId('5bc712afa90ec16ea20ff19f'), 'event_id': '1000073', 'created_time': '2018-10-17 18:45:03'}
3 {'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
4 {'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}
5 {'_id': ObjectId('5bd27f8ea90ec1277f7e91d1'), 'event_id': '1000073', 'created_time': '2018-10-26 10:44:30'}
6 {'_id': ObjectId('5bd6de89a90ec12779579b77'), 'event_id': '1000073', 'created_time': '2018-10-29 18:18:49'}
7 {'_id': ObjectId('5bd6e416a90ec1278e0a16e8'), 'event_id': '1000073', 'created_time': '2018-10-29 18:42:30'}
8 {'_id': ObjectId('5bd81a1ea90ec127806c7670'), 'event_id': '1000073', 'created_time': '2018-10-30 16:45:18'}
9 {'_id': ObjectId('5be015d8a90ec146e7432850'), 'event_id': '1000073', 'created_time': '2018-11-05 18:05:12'}

從以上的結果可以看出來,skip 3 , limit 2 . 就是下面idx 3 ,4的值.

上面的寫法也可以這樣寫:

values = cursor.limit(2).skip(3)

爲什麼可以這樣寫呢? 感覺非常像鏈式編程了. 爲什麼可以這樣隨意控制呢?
其實這裏 limit 最後返回的也是cursor 對象, skip 返回的也是cursor 對象. 所以這樣就可以一直 .skip().limit().skip() 這種方式進行編程.

這兩個方法返回的都是自己的對象, 也就是對應代碼:

看下 skip 代碼, 首先檢查skip 類型,做了一些簡單的判斷, 之後把 skip 保存到自己私有變量裏面. self.__skip

   def skip(self, skip):
        """Skips the first `skip` results of this cursor.

        Raises :exc:`TypeError` if `skip` is not an integer. Raises
        :exc:`ValueError` if `skip` is less than ``0``. Raises
        :exc:`~pymongo.errors.InvalidOperation` if this :class:`Cursor` has
        already been used. The last `skip` applied to this cursor takes
        precedence.

        :Parameters:
          - `skip`: the number of results to skip
        """
        if not isinstance(skip, integer_types):
            raise TypeError("skip must be an integer")
        if skip < 0:
            raise ValueError("skip must be >= 0")
        self.__check_okay_to_chain()

        self.__skip = skip
        return self

limit 的方法實現其實和skip 是差不多的.

    def limit(self, limit):
        """Limits the number of results to be returned by this cursor.

        Raises :exc:`TypeError` if `limit` is not an integer. Raises
        :exc:`~pymongo.errors.InvalidOperation` if this :class:`Cursor`
        has already been used. The last `limit` applied to this cursor
        takes precedence. A limit of ``0`` is equivalent to no limit.

        :Parameters:
          - `limit`: the number of results to return

        .. mongodoc:: limit
        """
        if not isinstance(limit, integer_types):
            raise TypeError("limit must be an integer")
        if self.__exhaust:
            raise InvalidOperation("Can't use limit and exhaust together.")
        self.__check_okay_to_chain()

        self.__empty = False
        self.__limit = limit
        return self

find 的返回結果 cursor 對象

cursor 對象可以通過collection.find() 來返回一個 cursor 對象
cursor對象可以實現了切片協議, 因此可以使用切片操作.

cursor.count() 方法可以查詢查詢了多少文檔,返回文檔總數.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@Time    : 2019/4/3 11:56
@File    : test_cursor_getitem.py
@Author  : [email protected]
"""

from pymongo import MongoClient

from config.DB import SHOUFUYOU_REPORTING_URI, SHOUFUYOU_REPORTING_DB_NAME

client = MongoClient(SHOUFUYOU_REPORTING_URI, maxPoolSize=50)
mongo_db = client[SHOUFUYOU_REPORTING_DB_NAME]
collection = mongo_db['contacts']

domain = {"event_id": "1000073"}
fields = {'_id': 1, 'created_time': 1, 'event_id': 1}

# 切片操作.
values = collection.find(domain, fields)[2:5]

print(f"count:{values.count()}")

for item in values:
    print(item)

count:393
{'_id': ObjectId('5bc712afa90ec16ea20ff19f'), 'event_id': '1000073', 'created_time': '2018-10-17 18:45:03'}
{'_id': ObjectId('5bc86b34a90ec16e6c44dcca'), 'event_id': '1000073', 'created_time': '2018-10-18 19:15:00'}
{'_id': ObjectId('5bc87a85a90ec16e6e222242'), 'event_id': '1000073', 'created_time': '2018-10-18 20:20:21'}

關於cursor 對象我簡單聊一下.

mongodb 讀取數據的工具類

實現從 mongodb 中讀取數據, 通過配置字段,以及篩選條件來完成參數的配置.

實現 read 方法批量讀取數據.

from pymongo import MongoClient
from config.DB import XINYONGFEI_RCS_GATEWAY_URI, XINYONGFEI_RCS_GATEWAY_DB_NAME
import logging

logger = logging.getLogger(__name__)



class MongoReader(BaseReader):
    def __init__(self, uri, db_name, collecion_name, domain, fields):
        """
        mongo reader    工具類
        :param url:  uri mongo 連接的URI
        :param db_name:  db名稱
        :param collecion_name:  collection_name
        :param domain:   查詢條件
        :param fields:  過濾字段  {"name":1,"_id":1}
        """
        super().__init__(url=uri)

        self._dbname = db_name
        self._collecion_name = collecion_name

        self.domain = domain
        self.fields = fields

        client = MongoClient(self.url)
        db = client[self._dbname]
        self.collecion = db[self._collecion_name]

        # 最大讀取數量
        self.max_count = 30000000000000


    def read(self, start=0, step=1000):

        limit = step - start
        skip_number = start

        count = self.collecion.count_documents(filter=self.domain)
        logger.info(f"total count:{count}")
        while True:
            logger.info(f'limit:{limit},skip:{skip_number}, start:{skip_number-start},end:{skip_number+limit}')
            # cursor = self.collecion.find(self.domain, self.fields, no_cursor_timeout=True).limit(limit).skip(
            #     skip_number)

            cursor = self.collecion.find(self.domain, self.fields, no_cursor_timeout=True).limit(limit).skip(
                skip_number)

            # 查詢數據量
            number = cursor.count(with_limit_and_skip=True)
            if number:
                yield [d for d in cursor]

            skip_number += number
            if number < limit:
                logger.info("number:{},limit:{}. number < limit,break".format(number, limit))
                # 把cursor 關掉
                cursor.close()
                break

            if skip_number >= self.max_count:
                logger.info("skip_number:{},self.max_count:{}.skip_number >= self.max_count,break".format(
                    skip_number,
                    self.max_count))
                # 把cursor 關掉
                cursor.close()
                break



if __name__ == '__main__':
    start_time = '2018-10-01 11:03:05'
    end_time = '2019-01-20 14:03:49'

    reader_config = {
        'uri': XINYONGFEI_RCS_GATEWAY_URI,
        'db_name': XINYONGFEI_RCS_GATEWAY_DB_NAME,
        'domain': {"$and": [{"created_time": {"$lt": end_time}}, {"created_time": {"$gte": start_time}},
                            {"method_id": "securityReport"}, {"status": "SUCCESS"}]},
        'fields': {"created_time": 1, "user_id": 1, "_id": 1},
        'collecion_name': 'moxieSecurityLog',
    }

    reader = MongoReader(**reader_config)

    for data in reader.read():
        print(data)
    print('frank')

錯誤總結:

1 CursorNotFound 錯誤, 報 cursor 沒有找到

報錯如下:
pymongo.errors.CursorNotFound: Cursor not found, cursor id: 387396591387

Exception in thread consumer_14:
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/rcsdecisionv2.py", line 843, in run
    result = self.parse(values)
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/rcsdecisionv2.py", line 872, in parse
    for posts in posts_list:
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 1132, in next
    if len(self.__data) or self._refresh():
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 1075, in _refresh
    self.__max_await_time_ms))
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/cursor.py", line 947, in __send_message
    helpers._check_command_response(doc['data'][0])
  File "/Users/frank/PycharmProjects/xinyongfei-bi-model/venv3/lib/python3.6/site-packages/pymongo/helpers.py", line 207, in _check_command_response
    raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: Cursor not found, cursor id: 387396591387

問題分析:
cursor 超時了.

設置參數 no_cursor_timeout = True



解決方案 :
demos = db['demo'].find({},{"_id": 0},no_cursor_timeout = True)
for cursor in demos:
        do_something()
demo.close() # 關閉遊標

官方文檔:
官方文檔默認是10min , 就會關閉 cursor , 這裏可以設置一個永不超時的參數.

no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.

參考資料:

https://www.jianshu.com/p/a8551bd17b5b

http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find

參考文檔 :

1 api cursor http://api.mongodb.com/python/current/api/pymongo/cursor.html
2 api count_documents http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.count_documents
3 api collecion.html http://api.mongodb.com/python/current/api/pymongo/collection.html
4 api 排序操作 http://api.mongodb.com/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.sort
5 mongodb tutorial http://api.mongodb.com/python/current/tutorial.html

1 Python3中PyMongo的用法 https://zhuanlan.zhihu.com/p/29435868
2 Python3 中PyMongo 的用法 https://cloud.tencent.com/developer/article/1005552
3 菜鳥用Python操作MongoDB,看這一篇就夠了 https://cloud.tencent.com/developer/article/1169645
4 PyMongo 庫使用基礎使用速成教程 https://www.jianshu.com/p/acc57241f9f0

分享快樂,留住感動. 2019-04-03 21:12:05 --frank

使用pymongo來操作mongodb數據庫

1 根據mongo_id 查詢文檔

cursor.find 的用法

1 find 中and 的用法

2 find 中 or 的用法

3 find 中 in 操作

4 find 中的排序操作

5 find 中的skip 和limit 操作.

find 的返回結果 cursor 對象

關於cursor 對象我簡單聊一下.

mongodb 讀取數據的工具類

錯誤總結:

1 CursorNotFound 錯誤, 報 cursor 沒有找到

開源高性能結構化日誌模塊NanoLog

杭州的 IT 崩盤了麼？

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

01-Python 中的數據類型-01-數字類型

00-陪你一起學python系列

python3 如何獲取一個文件的目錄,獲取上一級目錄

python3中的特性property介紹

02-python 基礎語法知識-03-內置函數

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

使用pymongo來操作mongodb數據庫

1 根據mongo_id 查詢文檔

cursor.find 的用法

1 find 中and 的用法

2 find 中 or 的用法

3 find 中 in 操作

4 find 中的排序操作

5 find 中 的skip 和limit 操作.

find 的返回結果 cursor 對象

關於cursor 對象我簡單聊一下.

mongodb 讀取 數據的工具類

錯誤總結:

1 CursorNotFound 錯誤, 報 cursor 沒有找到

5 find 中的skip 和limit 操作.

mongodb 讀取數據的工具類