Hadoop權威指南學習（三）——MapReduce應用開發

原創

2020-05-30 13:53

開發MapReduce程序，有一個特定流程：1.寫map和reduce函數，並經過單元測試；2. 編寫本地測試程序運行作業；3. 在集羣上運行，使用IsolationRunner在失敗的相同輸入數據上運行任務；4. 優化調整，任務剖析，Hadoop提供鉤子（hook）輔助分析。

1. 單元測試

import static org.mockito.Mockito.*;	// 使用mock建立模擬
public class MapperTest {
	@Test
	public void test() {
		Mapper mapper = new Mapper();
		Test value ="...";
		OutputCollector<Text, IntWriteable> output = mock(OutputCollector.class);
		mapper.map(null, value, output, null);
		verify(output).collect(new Test(".."), new IntWriteable(..));
		// 缺失值測試
		// verify(output, nerver).collect(any(Text.class), any(IntWriteable.class));
	}
}

2. 本地測試

public class Driver extends Configured implements Tool {
	@Override
	public int run(String[] args) throws Exception {
		// 配置jobConf, 輸入輸出路徑，map和reduce類
		JobClient.runJob(conf);
		return 0;
	}
}

public class DriverTest {
	@Test
	public void test() {
		JobConf conf = new JobConf();
		conf.set("fs.default.name", "file:///");	// 本地文件系統
		conf.set("mapred.job.tracker", "local");	// 本地運行器
		FileSystem fs = FileSystem.getLocal(conf);
		fs.delete(output, true); // delete old output
		Driver driver = new Driver();
		driver.setConf(conf);
		int res = driver.run(new String[]{...});
		checkOutput(conf, output);	// 逐行對比實際輸出與預期輸出
	}
}

3. 作業調試（在集羣上運行：利用 hadoop jar xx.jar mainClass args運行）

System.err.println("error");	// 輸出到日誌中，可通過Web UI查看
reporter.setStatus("...");	// 設置Task的status
reporter.incrCounter(...);	// 設置Task的counter

任何到標準輸出或標準錯誤流的寫操作都直接寫到日誌相關文件（Streaming方式標準輸出被用於map或reduce的輸出）

使用遠程調試器：IsolationRunner

4. 作業調優

mapper數量，reducer數量，cominer，中間值壓縮，自定義序列，調整shuffle

5. MapReduce工作流

將一個問題分解成多個mapreduce作業來執行： 1. 可以將一個mapper實現的功能分割到不同的mapper中，使用Hadoop自帶的ChainMapper類庫將其連接成一個mapper，再結合ChainReducer； 2. 運行多個作業時，可使用現行的作業鏈或者有向無環圖（DAG）控製作業順序執行，如使用JobControl。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hadoop權威指南學習（三）——MapReduce應用開發

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

easyjweb.bat命令

Java對象存儲

Hadoop權威指南學習（五）——MapReduce的類型和格式

ubuntu安裝到移動硬盤 & grub rescue問題解決

Hadoop權威指南學習（三）——MapReduce應用開發

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結