《6 Rules of Thumb for MongoDB Schema Design: Part 1》翻譯和筆記

原文整理在IT老兵驛站

前言

在工作中遇到了要使用MongoDB,學習MongoDB,肯定不能僅僅停留於對一些指令的簡單操作的掌握,就像當初學習MySQL一樣,要了解一下如何使用MongoDB來設計數據庫。這裏,找到一篇很好的文章,轉載在下面,配上一定的翻譯和學習筆記。

正文

By William Zola, Lead Technical Support Engineer at MongoDB

“I have lots of experience with SQL, but I’m just a beginner with MongoDB. How do I model a one-to-N relationship?” This is one of the more common questions I get from users attending MongoDB office hours.

I don’t have a short answer to this question, because there isn’t just one way, there’s a whole rainbow’s worth of ways. MongoDB has a rich and nuanced vocabulary for expressing what, in SQL, gets flattened into the term “One-to-N”. Let me take you on a tour of your choices in modeling One-to-N relationships.

筆記:MongoDB的新手往往會遇到一個問題,我應該怎麼去定義一個one-to-N的關係呢?“there’s a whole rainbow’s worth of ways. ”這句應該怎麼理解呢?

There’s so much to talk about here, I’m breaking this up into three parts. In this first part, I’ll talk about the three basic ways to model One-to-N relationships. In the second part I’ll cover more sophisticated schema designs, including denormalization and two-way referencing. And in the final part, I’ll review the entire rainbow of choices, and give you some suggestions for choosing among the thousands (really – thousands) of choices that you may consider when modeling a single One-to-N relationship.

筆記:這裏有很多需要討論,筆記會將它分爲三個部分來討論。第一部分,也就是本篇文章,來討論三種建立One-to-N關係模型的基本的方法;第二部分,討論更復雜的模型設計,包括反範式(denormalization)和雙向參考(two-way referencing);最後一部分,將會複習整個選擇的過程,並且給你們一些建立,來在上千的建立一個One-to-N關係的選擇中做出判斷。

Many beginners think that the only way to model “One-to-N” in MongoDB is to embed an array of sub-documents into the parent document, but that’s just not true. Just because you can embed a document, doesn’t mean you should embed a document.

筆記:很多初學者會認爲在MongoDB中建立一個“One-to-N”的模型只有一種方法,就是嵌入一個子文檔的數組(array),這不是事實。確實是這樣,看到的很多帖子就是這麼去誤導別人。

When designing a MongoDB schema, you need to start with a question that you’d never consider when using SQL: what is the cardinality of the relationship? Put less formally: you need to characterize your “One-to-N” relationship with a bit more nuance: is it “one-to-few”, “one-to-many”, or “one-to-squillions”? Depending on which one it is, you’d use a different format to model the relationship.

筆記:在開始設計一個MongoDB的模式時,你需要考慮一個在使用SQL從來不需要考慮的問題:關係的基數是什麼?具體來說,就是要考慮“one-to-few”,“one-to-many”, 或者“one-to-squillions”,這個基數不同,設計的格式也不同。

Basics: Modeling One-to-Few

An example of “one-to-few” might be the addresses for a person. This is a good use case for embedding – you’d put the addresses in an array inside of your Person object:

> db.person.findOne()
{
  name: 'Kate Monster',
  ssn: '123-456-7890',
  addresses : [
     { street: '123 Sesame St', city: 'Anytown', cc: 'USA' },
     { street: '123 Avenue Q', city: 'New York', cc: 'USA' }
  ]
}

This design has all of the advantages and disadvantages of embedding. The main advantage is that you don’t have to perform a separate query to get the embedded details; the main disadvantage is that you have no way of accessing the embedded details as stand-alone entities.

筆記:上面這是一個常見One-to-Few的例子,個人信息和地址的關係。好處在於你不用單獨執行一個查詢去獲取嵌入的信息;壞處在於你無法根據作爲一個單獨的條目去訪問一個嵌入的內容。這個例子很形象,在那本MySQL實例中,也涉及到人和地址的關係處理。就是說大千世界的一對多的關係其實不是那麼一刀切的,而SQL對這個的處理能力是有限的,或者說SQL原本的設計是沒有太多考慮這個因素的。這個應該結合那本書一起來討論,待完成......

For example, if you were modeling a task-tracking system, each Person would have a number of Tasks assigned to them. Embedding Tasks inside the Person document would make queries like “Show me all Tasks due tomorrow” much more difficult than they need to be. I will cover a more appropriate design for this use case in the next post.

Basics: One-to-Many

An example of “one-to-many” might be parts for a product in a replacement parts ordering system. Each product may have up to several hundred replacement parts, but never more than a couple thousand or so. (All of those different-sized bolts, washers, and gaskets add up.) This is a good use case for referencing – you’d put the ObjectIDs of the parts in an array in product document. (For these examples I’m using 2-byte ObjectIDs because they’re easier to read: real-world code would use 12-byte ObjectIDs.)

Each Part would have its own document:

> db.parts.findOne()
{
    _id : ObjectID('AAAA'),
    partno : '123-aff-456',
    name : '#4 grommet',
    qty: 94,
    cost: 0.94,
    price: 3.99

Each Product would have its own document, which would contain an array of ObjectID references to the Parts that make up that Product:

> db.products.findOne()
{
    name : 'left-handed smoke shifter',
    manufacturer : 'Acme Corp',
    catalog_number: 1234,
    parts : [     // array of references to Part documents
        ObjectID('AAAA'),    // reference to the #4 grommet above
        ObjectID('F17C'),    // reference to a different Part
        ObjectID('D2AA'),
        // etc
    ]

You would then use an application-level join to retrieve the parts for a particular product:

 // Fetch the Product document identified by this catalog number
> product = db.products.findOne({catalog_number: 1234});
   // Fetch all the Parts that are linked to this Product
> product_parts = db.parts.find({_id: { $in : product.parts } } ).toArray() ;

筆記:這個例子是產品和配件的關係,是One-to-Many的關係產品會有很多的配件,所以這裏使用ObjectID來關聯,這是一個單項關聯。這個例子也是很常見的用來描述One-to-Many關係的。

For efficient operation, you’d need to have an index on ‘products.catalog_number’. Note that there will always be an index on ‘parts._id’, so that query will always be efficient.

This style of referencing has a complementary set of advantages and disadvantages to embedding. Each Part is a stand-alone document, so it’s easy to search them and update them independently. One trade off for using this schema is having to perform a second query to get details about the Parts for a Product. (But hold that thought until we get to denormalizing in part 2.)

筆記:好處在於每一個配件都有一個獨立的文檔,很容易查詢和更新。交換就是需要單獨執行一個查詢去獲取配件信息。

As an added bonus, this schema lets you have individual Parts used by multiple Products, so your One-to-N schema just became an N-to-N schema without any need for a join table!

Basics: One-to-Squillions

An example of “one-to-squillions” might be an event logging system that collects log messages for different machines. Any given host could generate enough messages to overflow the 16 MB document size, even if all you stored in the array was the ObjectID. This is the classic use case for “parent-referencing” – you’d have a document for the host, and then store the ObjectID of the host in the documents for the log messages.

> db.hosts.findOne()
{
    _id : ObjectID('AAAB'),
    name : 'goofy.example.com',
    ipaddr : '127.66.66.66'
}

>db.logmsg.findOne()
{
    time : ISODate("2014-03-28T09:42:41.382Z"),
    message : 'cpu is on fire!',
    host: ObjectID('AAAB')       // Reference to the Host document
}

You’d use a (slightly different) application-level join to find the most recent 5,000 messages for a host:

  // find the parent ‘host’ document
> host = db.hosts.findOne({ipaddr : '127.66.66.66'});  // assumes unique index
   // find the most recent 5000 log message documents linked to that host
> last_5k_msg = db.logmsg.find({host: host._id}).sort({time : -1}).limit(5000).toArray()

筆記:主機和日誌的關係來體現One-to-Squillions,區別在於關係建立在了孩子身上,孩子指向了父親。

Recap

So, even at this basic level, there is more to think about when designing a MongoDB schema than when designing a comparable relational schema. You need to consider two factors:

  • Will the entities on the “N” side of the One-to-N ever need to stand alone?
  • What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions?

筆記:

在設計關係時,你需要考慮兩個因素:

  • One-to-N的“N”這邊需要單獨作爲一個條目嗎?
  • 關係的基數是什麼:one-to-few;one-to-many;或者 one-to-squillions?

Based on these factors, you can pick one of the three basic One-to-N schema designs:

  • Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
  • Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
  • Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions

筆記:

基於這些因素,你可以考慮這三個基本模式設計:

  • 如果基數是one-to-few,並且在父對象的上下文之外沒有訪問嵌入的對象的需求,那麼嵌入N邊。
  • 如果基數是one-to-many,或者N邊的對象基於一些原因需要單獨展示,那麼使用一個數組來指向N邊的對象。
  • 如果基數是one-to-squillions,使用一個參考去指向One那邊。

總結

學習和梳理了這篇文章,感覺思路清晰了很多,MongoDB是在One-to-N這個領域做了很多設計,這可能也是跟當前的One-to-N的需求越來越多,而SQL對這個支持有限有關係。

待辦的事情,配合總結一下MySQL的設計模式。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章