1、http://www.mongovue.com/2010/11/03/yet-another-mongodb-map-reduce-tutorial/
這篇文章中比較重要的知識點是:
Reduce takes 2 parameters – 1) Key 2) An array of values (number of values outputted from
Map step). Output of Reduce is an object.
It is important to note that Reduce can be called multiple times on a single key!
Yes, you read it correctly. It is not that difficult to think actually – consider a case where your data is huge and it lies on 2 different servers. It would be ideal to perform a Reduce on the given key on first server, and
then perform a Reduce for the same key on second server. And then do a Reduce on the results of these two reduced values.
The
picture above shows Reduce being called twice. This is just can example. To be frank, we don’t know how MongoDB executes Reduce. We don’t know which key it is going to be reduced first and which key last. We also don’t
know how many times it is going to call reduce for a key. This optimization is better left with MongoDB itself as it finds the most suitable parallel execution for every MapReduce command.
還有就是例子中對reduce解析的第二張圖片和說明:
2、http://www.infoq.com/cn/articles/implementing-aggregation-functions-in-mongodb
這篇博文主要是參考具體的例子和語句對mapreduce進行理解
這兩篇對reduce的原理寫的比較詳細。還有就是其官方的文檔中關於mapreduce的描寫:
3、http://docs.mongodb.org/manual/reference/command/mapReduce/#dbcmd.mapReduce
官方文檔中,主意以下部分:
-
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
-
the reduce function must be idempotent. Ensure that the following statement is true:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
-
the order of the elements in the valuesArray should not affect the output of the reduce function, so that the following statement is true:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
仔細研磨吧,路還長着呢