用eclipse編寫mapreduce程序

原創

posa88

2020-06-27 13:33

自己的博客好像要過期了，把一些還有用的東西搬過來避難～

首先，下載插件

這是另一個插件，你可以看看。

然後，放到eclipse/plugin下，我是fedora系統，我放在了/usr/lib/eclipse/plugins下。

然後把插件重命名爲：hadoop-eclipse-plugin-1.0.0.jar，

我的eclipse版本：

Eclipse Platform

Version: 3.6.1

Build id: M20100909-0800

我發現不改名不行，試了很多其他插件都不行。至於爲什麼改成這個名稱，因爲我發現有一個這個名稱的插件可以被我的eclipse發現，但是運行不了。

然後，我們重啓eclipse，可以在window-》open perspective下看見MapReduce（如果看不見，可能就是我上面說的問題），選擇後，在彈出的對話框中你需要配置Location name，如myhadoop，還有Map/Reduce Master和DFS Master。這裏面的Host、Port分別爲你在mapred-site.xml、core-site.xml中配置的地址及端口。

現在，我們應該可以看見平時project列表那，也就是Project Explorer下有一個DFS Locations.

我們下面已經有一個myhadoop了，就是剛纔配置的，否則，可以重新配置一個。點擊它，出現兩個目錄，就是我們的hadoop home目錄和用戶目錄。

現在我們先寫一個wordcount。

new一個MapReduce project，然後new 一個 mapReduce Driver，你也可以把map，reduce和dirver分開，這裏我不分開。

代碼：

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);

        private Text word = new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

            String line = value.toString();

            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens()) {

                word.set(tokenizer.nextToken());

                output.collect(word, one);

            }

        }

     }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

         public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

             int sum = 0;

             while (values.hasNext()) {

                 sum += values.next().get();

             }

             output.collect(key, new IntWritable(sum));

         }

    }

    public static void main(String[] args) throws Exception {

        JobConf conf = new JobConf(WordCount.class);

         conf.setJobName(“wordcount”);

         conf.setOutputKeyClass(Text.class);

         conf.setOutputValueClass(IntWritable.class);

         conf.setMapperClass(Map.class);

         conf.setCombinerClass(Reduce.class);

         conf.setReducerClass(Reduce.class);

         conf.setInputFormat(TextInputFormat.class);

         conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));

         FileOutputFormat.setOutputPath(conf, new Path(args[1]));

         JobClient.runJob(conf);

    }

}

然後，因爲我們運行時要命令行參數，我們點run-》run configurations-》arguments

program argument中輸入：input output

input對應hdfs中的輸入目錄，裏面有需要統計的文件，文件裏面有單詞。

output對應hdfs中的輸出目錄，要保證你這個目錄原先不存在，因爲hadoop爲了不覆蓋之前有用的運行結果，它是不允許覆蓋。

然後，我們運行，在console中輸出了過程的信息：

12/03/27 02:46:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

12/03/27 02:46:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/03/27 02:46:13 INFO mapred.FileInputFormat: Total input paths to process : 3

……

最後你可以看到你的hadoop中已經有了output目錄並且裏面有統計結果文件了。

我們可以用hadoop fs -ls output查看。

完畢。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

用eclipse編寫mapreduce程序

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

[Hadoop]使用DistributedCache進行復制聯結

Hadoop全分佈安裝配置及常見問題

使用hadoop的datajoin包進行關係型join操作

用eclipse編寫mapreduce程序

[MapReduce編程]用MapReduce大刀砍掉海量數據離線處理問題。

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結