MapReduce編程目錄

一、WordCount練習

要在 Eclipse 上編譯和運行 MapReduce 程序，需要安裝 hadoop-eclipse-plugin，參見廈大網址。
hadoop2x-eclipse-plugin-master的安裝包
提取碼：6l8p
（我把這次實驗需要用到的hadoop2x-eclipse-plugin-master的安裝包放在了我的百度網盤，需要的可以點擊上面鏈接直接下載，下載完成後放進ubuntu相應目錄裏面壓縮即可使用）
1、查看 HDFS 中的文件列表

2、在 Eclipse 中創建 MapReduce 項目WordCount

3、新建類 Class，在 Package 處填寫 org.apache.hadoop.examples；在 Name 處填寫 WordCount，如圖

4、 WordCount.java 這個文件的代碼如下

    package org.apache.hadoop.examples;
     
    import java.io.IOException;
    import java.util.Iterator;
    import java.util.StringTokenizer;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
     
    public class WordCount {
        public WordCount() {
        }
     
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
            if(otherArgs.length < 2) {
                System.err.println("Usage: wordcount <in> [<in>...] <out>");
                System.exit(2);
            }
     
            Job job = Job.getInstance(conf, "word count");
            job.setJarByClass(WordCount.class);
            job.setMapperClass(WordCount.TokenizerMapper.class);
            job.setCombinerClass(WordCount.IntSumReducer.class);
            job.setReducerClass(WordCount.IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
     
            for(int i = 0; i < otherArgs.length - 1; ++i) {
                FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
            }
     
            FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
            System.exit(job.waitForCompletion(true)?0:1);
        }
     
        public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
     
            public IntSumReducer() {
            }
     
            public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
                int sum = 0;
     
                IntWritable val;
                for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
                    val = (IntWritable)i$.next();
                }
     
                this.result.set(sum);
                context.write(key, this.result);
            }
        }
     
        public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
            private static final IntWritable one = new IntWritable(1);
            private Text word = new Text();
     
            public TokenizerMapper() {
            }
     
            public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
     
                while(itr.hasMoreTokens()) {
                    this.word.set(itr.nextToken());
                    context.write(this.word, one);
                }
     
            }
        }
    }

5、複製配置文件解決參數設置問題

    cp /usr/local/hadoop/etc/hadoop/core-site.xml ~/workspace/WordCount/src
    cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml ~/workspace/WordCount/src
    cp /usr/local/hadoop/etc/hadoop/log4j.properties ~/workspace/WordCount/src

6、複製完成後，右鍵點擊 WordCount 選擇 refresh 進行刷新，可以看到文件結構如下所示

7、設置運行時的相關參數，如圖所示

也可以直接在代碼中設置好輸入參數，如圖所示

8、設定參數後，再次運行程序，可以看到運行成功的提示如下所示

9、做到這裏，就可以使用 Eclipse 方便的進行 MapReduce程序的開發了

二、編譯、打包 Hadoop MapReduce 程序

10、將 Hadoop 的 classhpath 信息添加到 CLASSPATH 變量，執行 source ~/.bashrc 使變量生效

vim  ~/.bashrc
source ~/.bashrc

11、通過 javac 命令編譯 WordCount.java
(這裏要到你的WordCount.java目錄下運行！)

javac WordCount.java

12、把 .class 文件打包成 jar
（這裏也在相關目錄下運行）

jar -cvf WordCount.jar ./WordCount*.class

如圖（生成的jar包）

（我把它拷貝到了workspace/WordCount目錄下，方便！）

13、打包完成後，運行，創建幾個輸入文件

    mkdir input
    echo "echo of the rainbow" > ./input/file0
    echo "the waiting game" > ./input/file1

14、查看input下的文件

ls ./input

15、把本地文件上傳到僞分佈式HDFS上

/usr/local/hadoop/bin/hadoop fs -put ./input input

16、開始運行(代碼中設置了package包名，這裏要寫全！)

/usr/local/hadoop/bin/hadoop jar WordCount.jar org/apache/hadoop/examples/WordCount input output

終端運行結果

17、Localhost:50070端口查看/user/hadoop/output/結果

終端查看僞分佈式下/user/hadoop/output/結果

cd /usr/local/hadoop
./bin/hdfs dfs -ls /user/hadoop/output

18、查看僞分佈式下/user/hadoop/output/part-r-00000結果

./bin/hdfs dfs -cat /user/hadoop/output/part-r-00000

三、遇見的問題及解決辦法

問題一：出現找不到類的錯誤

解決辦法：這是因爲在代碼中設置了package包名，這裏也要寫全（正確的命令：/usr/local/hadoop/bin/hadoop jar WordCount.jar org/apache/hadoop/examples/WordCount input output）
問題二：Exception in thread “main” org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/hadoop/output already exists

解決辦法：僞分佈式下刪除運行時自動創建的output

./bin/hdfs dfs -rm -r /user/hadoop/output

加油！

Linux下MapReduce編程WordCount練習——使用命令行編譯打包運行MapReduce程序（裏面有對應安裝包下載）

MapReduce編程目錄

一、WordCount練習

二、編譯、打包 Hadoop MapReduce 程序

三、遇見的問題及解決辦法

linux安裝cuda和cudnn

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

支持向量機 SVM（學習筆記 + python編程練習）

使用圖像增強來訓練小數據集（完成狗貓數據集的兩階段分類實驗，原始數據直接訓練和數據增強後訓練）

多徑信道仿真（matlab，詳細介紹仿真方案的設計、結果及結論、完整代碼及註釋）

ubuntu環境下Sophus的安裝方式（有現成安裝包）

CSDN-markdown插入有趣Emoji表情的方法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結