今天在python中遇到了一個警告,如下:
UserWarning: MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#is-pymongo-fork-safe
"MongoClient opened before fork. Create MongoClient only "
查找資料後發現,是在多進程環境中,一個MongoClient實例不應該被當作參數在不同進程間進行傳遞。PyMongo是線程安全的,但並不是進程安全的,以本文中的demo爲例,MongoClient的實例不應該從父進程傳遞到子進程,這兩個進程應該各自創建屬於自己的MongoClient實例,否則,可能會發生死鎖等意外情況,而上面的警告便是提醒這些操作。
def insert_data(target_db, data_list):
target_db.insert_many(data_list)
if __name__ == '__main__':
# 這裏將測試的數據庫的地址替換爲了*
source_db = MongoClient('mongodb://admin:admin@*.*.*.*:*')['test_db']['test_1_13_1']
target_db = MongoClient('mongodb://admin:admin@*.*.*.*:*')['test_db']['test_1_13_2']
cursor = source_db.find().limit(100)
print(cursor.count(True))
doc_list = []
for doc in cursor:
doc_list.append(doc)
# 這裏將父進程中創建的MongoClient實例傳遞到了子進程中,導致了上文所述的警告
p = Process(target=insert_data, args=(target_db, doc_list))
p.start()
此時,可以將代碼改爲:
def insert_data_with_new_client(db_url, db_name, col_name, data_list):
# 這裏在子進程中創建了一個新的MongoClient的實例
target_client = MongoClient(db_url)
target_col = target_client[db_name][col_name]
target_col.insert_many(data_list)
# 儘量及時關閉連接,否則子進程較多的時候可能會發生連接較多的情況
target_client.close()
if __name__ == '__main__':
# 這裏將測試的數據庫的地址替換爲了*
source_db = MongoClient('mongodb://admin:admin@*.*.*.*:*')['test_db']['test_1_13_1']
target_db = MongoClient('mongodb://admin:admin@*.*.*.*:*')['test_db']['test_1_13_2']
cursor = source_db.find().limit(100)
print(cursor.count(True))
doc_list = []
for doc in cursor:
doc_list.append(doc)
url = 'mongodb://panruijie:[email protected]:30011'
db_name = 'tweet_stream'
col_name = 'test_1_14_0'
p = Process(target=insert_data_with_new_client, args=(url, db_name, col_name, doc_list))
p.start()
這樣,便可以解決上文中所述的問題了。