Hadoop之MapReduce程序分析

摘要：Hadoop之MapReduce程序包括三個部分：Mapper，Reducer和作業執行。本文介紹和分析MapReduce程序三部分結構。

關鍵詞：MapReduce Mapper Reducer 作業執行

MapReduce程序包括三個部分，分別是Mapper，Reducer和作業執行。

Mapper

一個類要充當Mapper需要繼承MapReduceBase並實現Mapper接口。

Mapper接口負責數據處理階段。它採用形式爲Mapper<K1,V1,K2,V2>的Java泛型。這裏的鍵類和值類分別實現了WritableComparable接口和Writable接口。Mapper接口只有一個map()方法，用於處理一個單獨的鍵值對。map()方法形式如下。

public void map(K1 key, V1 value, OutputCollector<K2,V2> output ,Reporter reporter ) throws IOException

或者

public void map(K1 key, V1 value, Context context) throws IOException, InterruptedException

該函數處理一個給定的鍵/值對(K1, V1)，生成一個鍵/值對(K2, V2)的列表（該列表也可能爲空）。

Hadoop提供的一些有用的Mapper實現，包括IdentityMapper，InverseMapper，RegexMapper和TokenCountMapper等。

Reducer

一個類要充當Reducer需要繼承MapReduceBase並實現Reducer接口。

Reduce接口有一個reduce()方法，其形式如下。

public void reduce(K2 key , Iterator<V2> value, OutputCollector<K3, V3> output, Reporter reporter) throws IOException

或者

public void reduce(K2 key, Iterator<V2> value, Context context) throws IOException, InterruptedException

當Reducer任務接受來自各個Mapper的輸出時，它根據鍵/值對中的鍵對輸入數據進行排序，並且把具有相同鍵的值進行歸併，然後調用reduce()函數，通過迭代處理那些與指定鍵相關聯的值，生成一個列表<K3, V3>（可能爲空）。

Hadoop提供一些有用Reducer實現，包括IdentityReducer和LongSumReducer等。

作業執行

在run()方法中，通過傳遞一個配置好的作業給JobClient.runJob()以啓動MapReduce作業。run()方法裏，需要爲每個作業定製基本參數，包括輸入路徑、輸出路徑、Mapper類和Reducer類。

一個典型的MapReduce程序基本模型如下。

public class MyJob extends Configured implements Tool {

/* mapreduce程序中Mapper*/

public static class MapClass extends MapReduceBase implements Mapper<Text,Text,Text,Text> {

public void map(Text key, Text value,

OutputCollector<Text,Text> output,

Reporter reporter) throws IOException {

//添加Mapper內處理代碼

}

/*MapReduce程序中Reducer*/

public static class Reduce extends MapReduceBase

implements Reducer<Text,Text,Text,Text> {

public void reduce<Text key,Iterator<Text> values,

OutputCollector<Text,Text>output,Reporter reporter)

throws IOException {

//添加Reducer內處理代碼

}

/*MapReduce程序中作業執行*/

public int run(String[] args) throws Exception {

//添加作業執行代碼

return 0;

}

Resource:

1 http://www.wangluqing.com/2014/03/hadoop-mapreduce-program-analyze/

2 參考《Hadoop實戰》第四章編寫MapReduce基礎程序

Hadoop之MapReduce程序分析

如何使用 JS 判斷用戶是否處於活躍狀態

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

求職那些事十三

讀書心得之前言

求職那些事十二之百度一面

Hadoop能做什麼

Gartner公佈2013年十大策略性技術與趨勢

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結