GridFS

GridFS Specification

When to use GridFS

This page is under construction

When to use GridFS

  • Lots of files. GridFS tends to handle large numbers (many thousands) of files better than many file systems.
  • User uploaded files. When users upload files you tend to have a lot of files, and want them replicated and backed up. GridFS is a perfect place to store these as then you can manage them the same way you manage your data. You can also query by user, upload date, etc... directly in the file store, without a layer of indirection
  • Files that often change. If you have certain files that change a lot - it makes sense to store them in GridFS so you can modify them in one place and all clients will get the updates. Also can be better than storing in source tree so you don't have to deploy app to update files.

When not to use GridFS

  • Few small static files. If you just have a few small files for a website (js,css,images) its probably easier just to use the file system.
  • Note that if you need to update a binary object atomically, and the object is under the document size limit for your version of MongoDB (16MB for 1.8), then you might consider storing the object manually within a single document. This can be accomplished using the BSON bindata type. Check your driver's docs for details on using this type.

File Tools

mongofiles is a tool for manipulating GridFS from the command line.



Introduction

It works by splitting large object into small chunks, usually 256k in size. (把一個文件切分成小塊兒存在mongo的collection裏)

Specification

Storage Collections

GridFS uses two collections to store data:

  • files contains the object metadata
  • chunks contains the binary chunks with some additional accounting information

the files and chunks collections are named with a prefix. (prefix相當於邏輯的文件系統)By default the prefix is fs.

Here's an example of the standard GridFS interface in Java:

/*
 * default root collection usage - must be supported
 */
GridFS myFS = new GridFS(myDatabase);              // returns a default GridFS (e.g. "fs" root collection)
myFS.storeFile(new File("/tmp/largething.mpg"));   // saves the file into the "fs" GridFS store

/*
 * specified root collection usage - optional
 */

GridFS myContracts = new GridFS(myDatabase, "contracts");             // returns a GridFS where  "contracts" is root
myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf"));  // retrieves object whose filename is "smithco"

files

Documents in the files collection require the following fields: 一個文件的metadata

{
  "_id" : <unspecified>,                  // unique ID for this file
  "length" : data_number,                 // size of the file in bytes
  "chunkSize" : data_number,              // size of each of the chunks.  Default is 256k
  "uploadDate" : data_date,               // date when object first stored
  "md5" : data_string                     // result of running the "filemd5" command on this file's chunks
}

chunks

The structure of documents from the chunks collection is as follows:

{
  "_id" : <unspecified>,         // object id of the chunk in the _chunks collection
  "files_id" : <unspecified>,    // _id of the corresponding files collection entry
  "n" : chunk_number,            // chunks are numbered in order, starting with 0
  "data" : data_binary,          // the chunk's payload as a BSON binary type
}


Indexes

GridFS implementations should create a unique, compound index in the chunks collection for files_id and n. Here's how you'd do that from the shell:

db.fs.chunks.ensureIndex({files_id:1, n:1}, {unique: true});

This way, a chunk can be retrieved efficiently using it's files_id and n values. Note that GridFS implementations should use findOne operations to get chunks individually, and should not leave open a cursor to query for all chunks. So to get the first chunk, we could do:

db.fs.chunks.findOne({files_id: myFileID, n: 0});

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章