簡化 MongoDB 關聯運算

MongoDB屬於 NoSql 中的基於分佈式文件存儲的文檔型數據庫，這種bson格式的文檔結構，更加貼近我們對物體各方面的屬性描述。而在使用 MongoDB 存儲數據的過程中，有時候難免需要進行關聯表查詢。自從 MongoDB 3.2 版本後，它提供了 $lookup 進行關聯表查詢，讓查詢功能改進了不少。但在實現應用場景中，所遇到的環境錯綜複雜，問題解決也非易事，腳本書寫起來也並不簡單。好在有了集算器 SPL 語言的協助，處理起來就相對容易多了。
本文我們將針對 MongoDB 在關聯運算方面的問題進行討論分析，並通過集算器 SPL 語言加以改進，方便用戶使用 MongoDB。討論將分爲以下幾個部分：
1. 關聯嵌套結構情況 1…………………………………………….. 1
2. 關聯嵌套結構情況 2…………………………………………….. 3
3. 關聯嵌套結構情況 3…………………………………………….. 4
4. 兩表關聯查詢………………………………………………………. 6
5. 多表關聯查詢………………………………………………………. 8
6. 關聯表中的數組查找…………………………………………… 10
Java 應用程序調用 DFX 腳本…………………………………… 12

1.關聯嵌套結構情況1

兩個關聯表，表 A 與表 B 中的內嵌文檔信息關聯, 且返回的信息在內嵌文檔中。表 childsgroup 字段 childs 是嵌套數組結構，需要合併的信息 name 在其下。

測試數據：

history:

_id	id	History	child_id
1	001	today worked	ch001
2	002	Working	ch004
3	003	now working	ch009

childsgroup:

_id	gid	name	childs
1	g001	group1	{"id":"ch001","info":{"name":"a",mobile:1111}},{"id":"ch002","info":{"name":"b",mobile:2222}}
2	g002	group1	{"id":"ch004","info":{"name":"c",mobile:3333}},{"id":"ch009","info":{"name":"d",mobile:4444}}

表History中的child_id與表childsgroup中的childs.id關聯，希望得到下面結果：

{
    "_id" : ObjectId("5bab2ae8ab2f1bdb4f434bc3"),
    "id" : "001",
    "history" : "today worked",
    "child_id" : "ch001",
    "childInfo" :
    {
         "name" : "a",
        " mobile" : 1111
    }
   ………………
}

Mongo 腳本

db.history.aggregate([
    {$lookup: {
        from: "childsgroup",
        let: {child_id: "$child_id"},
       pipeline: [
            {$match: { $expr: { $in: [ "$$child_id", "$childs.id"] } } },
            {$unwind: "$childs"},
            {$match: { $expr: { $eq: [ "$childs.id", "$$child_id"] } } },
            {$replaceRoot: { newRoot: "$childs.info"} }
            ],
            as: "childInfo"
        }},
{"$unwind": "$childInfo"}
])

這個腳本用了幾個函數lookup、pipeline、match、unwind、replaceRoot處理，一般 mongodb 用戶不容易寫出這樣複雜腳本；那麼我們再看看 spl 腳本是如何實現的：

SPL腳本 ( 文件名：childsgroup.dfx)

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"history.find()").fetch()
3	=mongo_shell(A1,"childsgroup.find()").fetch()
4	=A3.conj(childs)
5	=A2.join(child_id,A4:id,info)
6	>A1.close()

關聯查詢結果：

_id	id	history	child_id	info
1	001	today worked	ch001	[a,1111]
2	002	working	ch004	[c,3333]
3	003	now working	ch009	[d,4444]

腳本說明：
       A1：連接 mongodb 數據庫。
       A2：獲取 history 表中的數據。
       A3：獲取 childsgroup 表中的數據。
       A4：將 childsgroup 中的 childs 數據提取出來合併成序表。
       A5：表 history 中的 child_id 與表 childs 中的 id 關聯查詢，追加 info 字段, 返回序表。
       A6：關閉數據庫連接。

相對 mongodb 腳本寫法，SPL 腳本的難度降低了不少，思路也更加清晰，也不需要再去熟悉有關 mongo 函數的用法，以及如何去組合處理數據等，節約了不少時間。

2.關聯嵌套結構情況 2

兩個關聯表，表 A 與表 B 中的內嵌文檔信息關聯, 將信息合併到內嵌文檔中。表 txtPost 字段 comment 是嵌套數組結構，需要把 comment_content 合併到其下。

txtComment：

_ID	comment_no	comment_content
1	143	test test
2	140	math

txtPost

_ID	post_no	Comment
1	48	[{"comment_no" : 143, "comment_group" : 1} ]
2	47	[{"comment_no" : 140, "comment_group" : 2}， {"comment_no" : 143, "comment_group" : 3} ]

期望結果：

_ID	post_no	Comment
1	48	[{"comment_no" : 143, "comment_group" : 1，"comment_content" : "test test"} ]
2	47	[{"comment_no" : 140, "comment_group" : 2，"comment_content" : "math"}， {"comment_no" : 143, "comment_group" : 3，"comment_content" : "test test"} ]

Mongo 腳本

db.getCollection("txtPost").aggregate([
{ "$unwind": "$comment"},
{ "$lookup": {

    "from": "txtComment",
    "localField": "comment.comment_no",
    "foreignField": "comment_no",
    "as": "comment.comment_content"
}},
{ "$unwind": "$comment.comment_content"},
{ "$addFields": { "comment.comment_content":
"$comment.comment_content.comment_content" }},
{ "$group": {
    "_id": "$_id",
    'post_no':{"$first": "$post_no"},
    "comment": {"$push": "$comment"}
    }},
]).pretty()

表txtPost 按 comment 拆解成記錄，然後與表 txtComment 關聯查詢,將其結果放到數組中，再將數組拆解成記錄，將comment_content 值移到 comment 下，最後分組合並。

SPL 腳本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"txtPost.find()").fetch()
3	=mongo_shell(A1,"txtComment.find()").fetch()
4	=A2.conj(comment.derive(A2.post_no:pno))
5	=A4.join(comment_no,A3:comment_no,comment_content:Content)
6	=A5.group(pno;~:comment)
7	>A1.close()

關聯查詢結果：

pno	Comment
47	[[ 140, 2，47, …]，[143, 3，47, …] ]
48	[[143, 1，48, …]]

腳本說明：
      A1：連接 mongodb 數據庫。
      A2：獲取 txtPost 表中的數據。
      A3：獲取 txtComment 表中的數據。
      A4：將序表 A2 下的 comment 與 post_no 組合成序表，其中 post_no 改名爲 pno。
      A5：序表 A4 通過 comment_no 與序表 A3 關聯，追加字段 comment_content，將其改名爲 Content。
      A6：按 pno 分組返回序表，~ 表示當前記錄。
      A7：關閉數據庫連接。

Mongo、SPL 腳本實現方式類似，都是把嵌套結構的數據轉換成行列結構的數據，再分組合並。但 SPL 腳本的實現更簡單明瞭。

3.關聯嵌套結構情況 3

兩個關聯表，表 A 與表 B 中的內嵌文檔信息關聯, 且返回的信息在記錄上。表 collection2 字段 product 是嵌套數組結構，返回的信息是 isCompleted 等字段。

測試數據：
collection1:
{
   _id: '5bc2e44a106342152cd83e97',
   description
    {
      status: 'Good',
      machine: 'X'
     },
   order: 'A',
   lot: '1'
   };

collection2：
{
   _id: '5bc2e44a106342152cd83e80',
   isCompleted: false,
   serialNo: '1',
   batchNo: '2',
   product: [ // note the subdocuments here
        {order: 'A', lot: '1'},
        {order: 'A', lot: '2'}
    ]
}

期待結果
{
   _id: 5bc2e44a106342152cd83e97,
   description:
       {
         status: 'Good',
         machine: 'X',
       },
   order: 'A',
   lot: '1' ,
   isCompleted: false,
   serialNo: '1',
   batchNo: '2'
}

Mongo 腳本

db.collection1.aggregate([{
$lookup: {

              from: "collection2",
              let: {order: "$order", lot: "$lot"},
              pipeline: [{
                     $match: {
                     $expr:{ $in: [ { order: "$$order", lot: "$$lot"}, "$product"] }
                     }
                     }],
                     as: "isCompleted"
                     }
              }, {
                     $addFields: {
                     "isCompleted": {$arrayElemAt: [ "$isCompleted", 0] }
                     }
              }, {
                     $addFields: { // add the required fields to the top level structure
                     "isCompleted": "$isCompleted.isCompleted",
                   "serialNo": "$isCompleted.serialNo",
                     "batchNo": "$isCompleted.batchNo"
              }
}])

lookup 兩表關聯查詢，首個 addFields獲取isCompleted數組的第一個記錄，後一個addFields 轉換成所需要的幾個字段信息

SPL腳本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"collection1.find()").fetch()
3	=mongo_shell(A1,"collection2.find()").fetch()
4	=A3.conj(A2.select(order:A3.product.order,lot:A3.product.lot).derive(A3.serialNo:sno,A3.batchNo:bno))
5	＞A1.close()

腳本說明：
      A1：連接 mongodb 數據庫。
      A2：獲取 collection1 表中的數據。
      A3：獲取 collection2 表中的數據。
      A4：根據條件 order, lot 從序表 A2 中查詢記錄，然後追加序表 A3 中的字段 serialNo, batchNo，返回合併後的序表。
      A5：關閉數據庫連接。

      Mongo、SPL 腳本都實現了預期的結果。SPL 很清晰地實現了從數據記錄中的內嵌結構中篩選，將符合條件的數據合併成新序表。

4.兩表關聯查詢

從關聯表中選擇所需要的字段組合成新表。

Collection1:

user1	user2	income
1	2	0.56
1	3	0.26

collection2:

user1	user2	output
1	2	0.3
1	3	0.4
2	3	0.5

期望結果：

user1	user2	income	output
1	2	0.56	0.3
1	3	0.26	0.4

Mongo 腳本

db.c1.aggregate([
    { "$lookup": {
    "from": "c2",
        "localField": "user1",
        "foreignField": "user1",
        "as": "collection2_doc"
    }},
    { "$unwind": "$collection2_doc"},
    { "$redact": {
        "$cond": [
            {"$eq": [ "$user2", "$collection2_doc.user2"] },
            "$$KEEP",
            "$$PRUNE"
        ]
    }},
    { "$project": {
        "user1": 1,
        "user2": 1,
        "income": "$income",
        "output": "$collection2_doc. output"
    }}
    ]).pretty()

lookup 兩表進行關聯查詢，redact 對記錄根據條件進行遍歷處理，project 選擇要顯示的字段。

SPL腳本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"c1.find()").fetch()
3	=mongo_shell(A1,"c2.find()").fetch()
4	=A2.join(user1:user2,A3:user1:user2,output)
5	>A1.close()

腳本說明：
      A1：連接 mongodb 數據庫。
      A2：獲取c1表中的數據。
      A3：獲取c2表中的數據。
      A4：兩表按字段 user1,user2 關聯，追加序表 A3 中的 output 字段，返回序表。
      A5：關閉數據庫連接。

Mongo、SPL 腳本都實現了預期的結果。SPL 通過 join 把兩個關聯表不同的字段合併成新表，與關係數據庫用法類似。

5.多表關聯查詢

多於兩個表的關聯查詢，結合成一張大表。

Doc1:

_id	firstName	lastName
U001	shubham	verma

Doc2:

_id	userId	address	mob
2	U001	Gurgaon	9876543200

Doc3:

_id	userId	fbURLs	twitterURLs
3	U001	http://www.facebook.com	http://www.twitter.com

合併後的結果：
{
    "_id" : ObjectId("5901a4c63541b7d5d3293766"),
    "firstName" : "shubham",
    "lastName" : "verma",
    "address" : {
        "address" : "Gurgaon"
    },
    "social" : {
        "fbURLs" : "http://www.facebook.com",
        "twitterURLs" : "http://www.twitter.com"
    }
}

Mongo 腳本

db.doc1.aggregate([
    {$match: { _id: ObjectId("5901a4c63541b7d5d3293766") } },
    {
        $lookup:
        {
            from: "doc2",
            localField: "_id",
            foreignField: "userId",
            as: "address"
        }
    },
    {
        $unwind: "$address"
    },
    {
        $project: {
            "address._id": 0,
            "address.userId": 0,
            "address.mob": 0
        }
    },
    {
        $lookup:
        {
            from: "doc3",
            localField: "_id",
            foreignField: "userId",
            as: "social"
        }
    },
    {
        $unwind: "$social"
    },

{
    $project: {
           "social._id": 0,
           "social.userId": 0
       }
}
]).pretty();

由於 Mongodb 數據結構原因，寫法也多樣化，展示也各不相同。

SPL腳本：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"doc1.find()").fetch()
3	=mongo_shell(A1,"doc2.find()").fetch()
4	=mongo_shell(A1,"doc3.find()").fetch()
5	=A2.join(_id,A3:userId,address,mob)
6	=A5.join(_id,A4:userId,fbURLs,twitterURLs)
7	>A1.close()

Mongo、SPL 腳本都實現了預期的結果。此 SPL 腳本與上面例子類似，只是多了一個關聯表，每次 join 就新增加字段，最後疊加構成一張大表。

SPL 腳本的簡潔性、統一性非常明顯。

6.關聯表中的數組查找

從關聯表記錄數據組中查找符合條件的記錄, 用給定的字段組合成新表。

測試數據：

users:

_id	Name	workouts
1000	xxx	[2,4,6]
1002	yyy	[1,3,5]

workouts:

_id	Date	Book
1	1/1/2001	Othello
2	2/2/2001	A Midsummer Night's Dream
3	3/3/2001	The Old Man and the Sea
4	4/4/2001	GULLIVER’S TRAVELS
5	5/5/2001	Pickwick Papers
6	6/6/2001	The Red and the Black

期望結果：

Name	_id	Date	Book
xxx	2	2/2/2001	A Midsummer Night's Dream
xxx	4	4/4/2001	GULLIVER’S TRAVELS
xxx	6	6/6/2001	The Red and the Black
yyy	1	1/1/2001	Othello
yyy	3	3/3/2001	The Old Man and the Sea
yyy	5	5/5/2001	Pickwick Papers

Mongo 腳本

db.users.aggregate([
{ "$lookup": {

    "from" : "workouts",
    "localField" : "workouts",
    "foreignField" : "_id",
    "as" : "workoutDocumentsArray"
}},
{$project: { _id:0,workouts:0} } ,
{"$unwind": "$workoutDocumentsArray"},
{"$replaceRoot": { "newRoot": { $mergeObjects: [ "$$ROOT", "$workoutDocumentsArray"] } } },
{$project: { workoutDocumentsArray: 0} }
]).pretty()

把關聯表 users,workouts 查詢結果放到數組中，再將數組拆解，提升子記錄的位置，去掉不需要的字段。

SPL腳本 (users.dfx)：

	A	B
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"users.find()").fetch()
3	=mongo_shell(A1,"workouts.find()").fetch()
4	=A2.conj(A3.select(A2.workouts^~.array(_id)!=[]).derive(A2.name))
5	>A1.close()

腳本說明：
      A1：連接 mongodb 數據庫。
      A2：獲取users表中的數據。
      A3：獲取workouts表中的數據。
      A4：查詢序表 A3 的 _id 值存在於序表A2中 workouts 數組的記錄, 並追加 name 字段。返回合併的序表。
      A5：關閉數據庫連接。
      由於需要獲取序列的交集不爲空爲條件，故將 _id 轉換成序列。
      Mongo、SPL 腳本都實現了預期的結果。從腳本實現過程來看，SPL 集成度高而又不失靈活性，讓程序簡化了不少。

7.Java 應用程序調用 DFX 腳本

      在通過 SPL 腳本對 MongoDB 數據進行了關聯計算後，其結果可以被 java 應用程序很容易地使用。集算器提供了 JDBC 驅動程序，用 JDBC 存儲過程方式訪問，與調用存儲過程相同。（JDBC 具體配置參考《集算器教程》中的“ JDBC 基本使用”章節）
   Java 調用主要過程如下：
   public void testUsers(){
       Connection con = null;
       com.esproc.jdbc.InternalCStatement st;
       try{
         // 建立連接
         Class.forName("com.esproc.jdbc.InternalDriver");
         con= DriverManager.getConnection("jdbc:esproc:local://");
         // 調用存儲過程，其中 users 是 dfx 的文件名
         st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call users> ()");
         // 執行存儲過程
         st.execute();
         // 獲取結果集
         ResultSet rs = st.getResultSet();
          。。。。。。。
   catch(Exception e){
         System.out.println(e);
   }
       可以看到，使用時按標準的 JDBC 方法操作，集算器很方便嵌入到 Java 應用程序中。同時，集算器也支持 ODBC 驅動，因此集成到其它支持 ODBC 的語言也非常容易。

Mongo 存儲的數據結構相對關係數據庫更復雜、更靈活，其提供的查詢語言也非常強、適應面廣，同時需要了解函數也不少，函數之間的結合更是變化無窮，因此要熟練掌握並應用也並非易事。集算器的離散性、易用性恰好能彌補 Mongo 這方面的不足，在降低 mongo 學習成本及使用複雜度、難度的同時，讓 mongo 的功能得到更充分的展現。

簡化 MongoDB 關聯運算

1.關聯嵌套結構情況1

2.關聯嵌套結構情況 2

3.關聯嵌套結構情況 3

4.兩表關聯查詢

5.多表關聯查詢

6.關聯表中的數組查找

7.Java 應用程序調用 DFX 腳本

Python多線程編程深度探索：從入門到實戰

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

集算器學習材料彙總

從數據整理到業務計算的最佳工具

協助報表開發之 MongoDB join

協助 MongoDB 計算之交叉彙總

產權交易所解析 HTML 與計算案例

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結