人民大學雲計算編程的網上評估平臺--解題報告 1001-1003

(http://youzitool.com 新博客，歡迎訪問)

這幾天忙着找實習，所以日誌耽擱了，現在來補起~~。

相信很多人都知道 PKU Online Judge， 現在中國人民大學也提供了一個類似的平臺，但與北京在線評判系統不一樣的是，中國人民大學的這個系統是專門評判mapreduce編程題的。

我把鏈接發出來，大家可以去試着做看看: http://cloudcomputing.ruc.edu.cn/index.jsp

大家在做題前，先看看“常見問題”根據系統要求的格式來寫程序。不然不能正常運行。（我就是直接運行錯了3次。 - -！）

可以看到這個平臺的題目還不多，現在只有1000-1009，其中1008-1009的題目還沒發出來。所以我們討論1000-1007.

如果你想先自己測試下，下面的文章就可以先不忙看。等你解決其中的題，可以再來看這篇文章，大家可以共同提高。

1000 比較簡單，用hadoop自帶的例子都可以解決，我這裏就不多說了。

1001 題目：

a+b per line

描述

有時候你會遇到這樣的問題：你有一個表格，給出了每個人在十二月，一月和二月的收入。表格如下：
name  Dec   Jan($)
CM    200   314
LY    2000  332
QQM   6000  333
ZYM   5000  333
BP    30    12

你需要知道每個人這三個月的收入總和，那麼你就需要將表格中一行代表收入的數字相加.下面請編寫程序解決這個問題。

輸入

輸入只包含一個文件，文件中有一個表格，它的結構如下:
1 200   314
2 2000  332
3 6000  333
4 5000  333
5 30    12
其中每行最前面的數字是行標

輸出

輸出是一個文本文件，每一行第一個數字式行標，第二個數字是輸入文件中每一行除行標外數字的和。如下:
1 514
2 2332
3 6333
4 5333
5 42

輸入樣例

input:
1 200   314
2 2000  332
3 6000  333
4 6000  333
5 5000  333
6 30    12

輸出樣例:

1 514
2 2332
3 6333
4 6333
5 5333
6 42

注意:
1 輸入文件和輸出文件都只有一個；
2 輸入和輸出文件每行的第一個數字都是行標；
3 每個數據都是正整數或者零.。

1001 解題思路：

1001的題目其實是很簡單的，將讀入的每一行用空格分隔，第一個域就是行號作爲key、再將第二個域和第三個域相加作爲value.

因爲map階段會根據key值自動排序，我們就不用操心了。至於key的排序順序，我們以後討論。

現在上代碼：

1002 題目：

Sort

描述

你的程序需要讀入輸入數據文件，然後再將數據按升序排序後輸出。在輸入文件中，每一行都代表一個數據。

輸入

輸入是一組文本文件，在文本文件中每一行都是一個元數據，而且每個數據是用一個數字串代表待排序的數字。

輸出

輸出文件中每一行第一個數字是行標，後面一個數字是排好序的原始輸入數據，注意排序順序是從小到大升序排序。

輸入樣例

input1:

2
32
654
32
15
756
65223

input2:

5956
22
650
92

input3:

26
54
6

輸出樣例:
1 2
2 6
3 15
4 22
5 26
6 32
7 32
8 54
9 92
10 650
11 654
12 756
13 5956
14 65223
1002 解題思路：

在上一題已經說過在map階段會對key自動排序，所以我們讀入一行後（元數據），將其作爲key，傳遞給reduce。我們可以看到最後輸出的樣例，還需要打印出行號。所以我們在reduce外面定義一個int 來記錄總的行數（作爲key輸出）。而將map階段傳來的key作爲reduce階段的value輸出。

上代碼吧：

public class MyMapre { public static class wordcountMapper extends Mapper{ public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException{ String one = value.toString(); context.write(new LongWritable(Integer.parseInt(one)) , key); } } public static class wordcountReduce extends Reducer{ int sum = 0; public void reduce(LongWritable key, Iterablevalues, Context context)throws IOException, InterruptedException{ sum++; context.write(new LongWritable(sum), key); } } public static void main(String args[])throws Exception{ Configuration conf = new Configuration(); Job job = new Job(conf, "Sort"); job.setJarByClass(MyMapre.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(LongWritable.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(LongWritable.class); job.setMapperClass(wordcountMapper.class); job.setReducerClass(wordcountReduce.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }

1003 題目：

Data deduplication

描述

你的程序要求讀入輸入文件，在去掉所有數據中的重複數據後輸出結果。在輸入文件中每一行是一個元數據。

輸入

輸入是一組文本文件，在每個輸入文件中每一行是一個數據。每一個元數據都是一個字符串。

輸出文件

輸出文件的每一行都是在輸入文件中出現過的一個數據，並且輸出文件中的每一行都不相同。

輸入樣例

input1:
2006-6-9 a
2006-6-10 b
2006-6-11 c
2006-6-12 d
2006-6-13 a
2006-6-14 b
2006-6-15 c
2006-6-11 c
input2:
2006-6-9 b
2006-6-10 a
2006-6-11 b
2006-6-12 d
2006-6-13 a
2006-6-14 c
2006-6-15 d
2006-6-11 c

輸出樣例:
2006-6-10 a
2006-6-10 b
2006-6-11 b
2006-6-11 c
2006-6-12 d
2006-6-13 a
2006-6-14 b
2006-6-14 c
2006-6-15 c
2006-6-15 d
2006-6-9 a
2006-6-9 b

注意:
1 輸出結果是按照字典順序排序的；
2 每一行都是一個元數據；
3 重複數據在輸出文件中也要輸出一次。
1003 解題思路：

首先還是將一行進行劃分，將第一個域作爲map階段的key輸出。第二個域作爲map階段的value輸出。

reduce收到key-value對後，key相同時，會返回多個value。根據題意要求，value中出現的字母不能重複，所以我們要消掉重複的字母、而且最後需要排序，我們可以調用java自帶的排序函數來實現。

上代碼了：

public class MyMapre { public static class wordcountMapper extends Mapper{ public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException{ String line = value.toString(); Text word = new Text(); Text one = new Text(); StringTokenizer itr = new StringTokenizer(line);//劃分 if (itr.hasMoreElements()) word.set(itr.nextToken()); if (itr.hasMoreElements()) one.set(itr.nextToken()); //獲取兩個域的值 context.write(word, one); } } public static class wordcountReduce extends Reducer{ public void reduce(Text key, Iterablevalues, Context context)throws IOException, InterruptedException{ String pre = ""; //消除重複字母的變量 List list = new ArrayList(); //進行排序前存儲的list for (Text str : values){ if (!str.toString().equals(pre)) { //如果不相等者更新pre變量 pre = str.toString(); list.add(pre); //向list中添加不重複的元素 } } Collections.sort(list); //排序 for (int i = 0; i < list.size(); i++) context.write(key, new Text(list.get(i))); //一次性輸出 } } public static void main(String args[])throws Exception{ Configuration conf = new Configuration(); Job job = new Job(conf, "deduplication"); job.setJarByClass(MyMapre.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setMapperClass(wordcountMapper.class); job.setReducerClass(wordcountReduce.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }

人民大學雲計算編程的網上評估平臺--解題報告 1001-1003

Shellcode 編碼、解碼

搭建Hadoop環境（在winodws環境下用虛擬機虛擬兩個ubuntu系統進行搭建）

VC按鈕切換界面

VC下劃分窗口並固定

windows下安裝nutch

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結