hadoop權威指南第三版 發佈說明


(此文摘自http://hadoopbook.com)


hadoop權威指南第三版發行說明:


        第三版會在2012年5月發行。你現在可以預定一份電子版,或購買“Early Release”版,買了這版送正式版。(這話對國人基本沒用,呵呵!)
        下面大概說說這本書的一些改動。
        
        第三版添加了哪些新東西?
        
        第三版內容覆蓋hadoop發行包1.x(原0.20版),也包括0.22,0.23版。書中所有的例子都已經在這些版本上運行過, 除了少數例外的情況,都已經在文中標註了。其實每一版的新特性都在第一章的"Hadoop Releases"描述了。
        這一版大部分例子用新API,由於舊版API仍在廣泛使用,所以在旁註中仍然討論它,舊版的實現代碼可以在這本書的網站找到。
        hadoop 0.23的主要變化是使用了new MapReduce runtime, MapReduce 2,是一個基於新的分佈式資源管理系統的YARN,第六章講如何工作,第七章講如何應用。

        書中包括了更多的mapreduce資料,比如用maven打包MapReduce,設置java環境變量,寫MRUnit測試單元(第五章介紹),還有一些更深入的特性,比如輸出的提交,分佈式緩存等(第8章),任務內存監控(第9章),第4章新增了通過mapreduce job處理avro 數據,第5章介紹了用oozie運行簡單的workflow 工作流。

(很遺憾沒有coodenater的介紹)

        第3章在講HDFS時介紹了高可用性,聯合特性,及新的WebHDFS和HttpFS文件系統。
        Pig, Hive, Sqoop, and ZooKeeper這幾個框架的最新版的特性和修改都有擴展介紹。

        這本書還有許多修改和提高。


原文:

Third Edition

The third edition is due to be published in May 2012. You can pre-order a copy, or buy the “Early Release” ebook today (you will receive the final ebook version when it is available for no extra charge).

The following section is from the book’s preface, and outlines the changes in the third edition.

What’s New in the Third Edition?

The third edition covers the 1.x (formerly 0.20) release series of Apache Hadoop, as well as the newer 0.22 and 0.23 series. With a few exceptions, which are noted in the text, all the examples in this book run against these versions. The features in each release series are described at a high-level in "Hadoop Releases" in Chapter 1.

This edition uses the new MapReduce API for most of the examples. Since the old API is still in widespread use, it continues to be discussed in the text alongside the new API, and the equivalent code using the old API can be found on the book’s website.

The major change in Hadoop 0.23 is the new MapReduce runtime, MapReduce 2, which is built on a new distributed resource management system called YARN. This edition includes new sections covering MapReduce on YARN: how it works (Chapter 6) and how to run it (Chapter 9).

There is more MapReduce material too, including development practices like packaging MapReduce jobs with Maven, setting the user’s Java classpath, and writing tests with MRUnit (all in Chapter 5); and more depth on features such as output committers, the distributed cache (both in Chapter 8), and task memory monitoring (Chapter 9). There is a new section on writing MapReduce jobs to process Avro data (Chapter 4), and on running a simple MapReduce workflow in Oozie (Chapter 5).

The chapter on HDFS (Chapter 3) now has introductions to High Availability, Federation, and the new WebHDFS and HttpFS filesystems.

The chapters on Pig, Hive, Sqoop, and ZooKeeper have all been expanded to cover the new features and changes in their latest releases.

In addition, numerous corrections and improvements have been made throughout the book.


發佈了45 篇原創文章 · 獲贊 23 · 訪問量 10萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章