系統環境

Linux Ubuntu 16.04

jdk-7u75-linux-x64

hadoop-2.6.0-cdh5.4.5

hadoop-2.6.0-eclipse-cdh5.4.5.jar

eclipse-java-juno-SR2-linux-gtk-x86_64

任務內容

現有某電商的用戶好友數據文件，名爲 buyer1，buyer1中包含（buyer_id,friends_id）兩個字段，內容是以"\t"分隔，編寫MapReduce進行單表連接，查詢出用戶的間接好友關係。例如：10001的好友是10002，而10002的好友是10005，那麼10001和10005就是間接好友關係。

buyer1(buyer_id,friends_id)

10001	10002
10002	10005
10003	10002
10004	10006
10005	10007
10006	10022
10007	10032
10009	10006
10010	10005
10011	10013

統計結果數據如下：

好友id  用戶id
10005	10001
10005	10003
10007	10010
10007	10002
10022	10004
10022	10009
10032	10005

任務步驟

1.切換到/apps/hadoop/sbin目錄下，開啓hadoop。

cd /apps/hadoop/sbin  
./start-all.sh

2.在Linux本地新建/data/mapreduce7目錄。

mkdir -p /data/mapreduce7

3.在Linux中切換到/data/mapreduce7目錄下，用wget命令從http://192.168.1.100:60000/allfiles/mapreduce7/buyer1網址上下載文本文件buyer1。

cd /data/mapreduce7
wget http://192.168.1.100:60000/allfiles/mapreduce7/buyer1

然後在當前目錄下用wget命令從http://192.168.1.100:60000/allfiles/mapreduce7/hadoop2lib.tar.gz網址上下載項目用到的依賴包。

wget http://192.168.1.100:60000/allfiles/mapreduce7/hadoop2lib.tar.gz

將hadoop2lib.tar.gz解壓到當前目錄下。

tar zxvf hadoop2lib.tar.gz

4.首先在hdfs上新建/mymapreduce7/in目錄，然後將Linux本地/data/mapreduce7目錄下的buyer1文件導入到hdfs的/mymapreduce7/in目錄中。

hadoop fs -mkdir -p /mymapreduce7/in  
hadoop fs -put /data/mapreduce7/buyer1 /mymapreduce7/in

5.打開Eclipse，新建Java Project項目，項目名爲mapreduce7

在mapreduce7項目裏新建包，包名爲mapreduce，在mapreduce包下新建類，類名爲DanJoin

6.添加項目所需依賴的jar包，右鍵單擊mapreduce7，新建一個文件夾，用於存放項目所需的jar包。

將/data/mapreduce7目錄下，hadoop2lib目錄中的jar包，拷貝到eclipse中mapreduce7項目的hadoop2lib目錄下，選中所有項目hadoop2lib目錄下所有jar包，並添加到Build Path中。

7.編寫Java代碼，並描述其設計思路

Map代碼

public static class Map extends Mapper<Object,Text,Text,Text>{
   //實現map函數
public void map(Object key,Text value,Context context)
				throws IOException,InterruptedException{
				String line = value.toString();
				String[] arr = line.split("\t");   //按行截取
				String mapkey=arr[0];
				String mapvalue=arr[1];
				String relationtype=new String();  //左右表標識
				relationtype="1";  //輸出左表
				context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
				//System.out.println(relationtype+"+"+mapvalue);
				relationtype="2";  //輸出右表
				context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
				//System.out.println(relationtype+"+"+mapvalue);

		}
    }

Map處理的是一個純文本文件，Mapper處理的數據是由InputFormat將數據集切分成小的數據集InputSplit，並用RecordReader解析成<key/value>對提供給map函數使用。map函數中用split("\t")方法把每行數據進行截取，並把數據存入到數組arr[]，把arr[0]賦值給mapkey，arr[1]賦值給mapvalue。用兩個context的write()方法把數據輸出兩份，再通過標識符relationtype爲1或2對兩份輸出數據的value打標記。

Reduce代碼

public static class Reduce extends Reducer<Text, Text, Text, Text>{
 //實現reduce函數
public void reduce(Text key,Iterable<Text> values,Context context)
    throws IOException,InterruptedException{
    int buyernum=0;
    String[] buyer=new String[20];
    int friendsnum=0;
    String[] friends=new String[20];
    Iterator ite=values.iterator();
    while(ite.hasNext()){
    String record=ite.next().toString();
    int len=record.length();
    int i=2;
    if(0==len){
    continue;
    }
    //取得左右表標識
    char relationtype=record.charAt(0);
    //取出record，放入buyer
    if('1'==relationtype){
    buyer [buyernum]=record.substring(i);
    buyernum++;
    }
    //取出record，放入friends
    if('2'==relationtype){
    friends[friendsnum]=record.substring(i);
    friendsnum++;
    }
    }
    //buyernum和friendsnum數組求笛卡爾積
    if(0!=buyernum&&0!=friendsnum){
    for(int m=0;m<buyernum;m++){
    for(int n=0;n<friendsnum;n++){
    if(buyer[m]!=friends[n]){
    //輸出結果
    context.write(new Text(buyer[m]),new Text(friends[n]));
    }
    }
    }
    }
    }
    }

reduce端在接收map端傳來的數據時已經把相同key的所有value都放到一個Iterator容器中values。reduce函數中，首先新建兩數組buyer[]和friends[]用來存放map端的兩份輸出數據。然後Iterator迭代中hasNext()和Next()方法加while循環遍歷輸出values的值並賦值給record，用charAt(0)方法獲取record第一個字符賦值給relationtype，用if判斷如果relationtype爲1則把用substring(2)方法從下標爲2開始截取record將其存放到buyer[]中，如果relationtype爲2時將截取的數據放到frindes[]數組中。然後用三個for循環嵌套遍歷輸出<key,value>，其中key=buyer[m]，value=friends[n]。

完整代碼

package mapreduce;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class DanJoin {
	public static class Map extends Mapper<Object,Text,Text,Text>{
		public void map(Object key,Text value,Context context)
				throws IOException,InterruptedException{
				String line = value.toString();
				String[] arr = line.split("\t");
				String mapkey=arr[0];
				String mapvalue=arr[1];
				String relationtype=new String();
				relationtype="1";
				context.write(new Text(mapkey),new Text(relationtype+"+"+mapvalue));
				//System.out.println(relationtype+"+"+mapvalue);
				relationtype="2";
				context.write(new Text(mapvalue),new Text(relationtype+"+"+mapkey));
				//System.out.println(relationtype+"+"+mapvalue);
		}
    }
	public static class Reduce extends Reducer<Text, Text, Text, Text>{
		public void reduce(Text key,Iterable<Text> values,Context context)
    throws IOException,InterruptedException{
    int buyernum=0;
    String[] buyer=new String[20];
    int friendsnum=0;
    String[] friends=new String[20];
    Iterator ite=values.iterator();
    while(ite.hasNext()){
    String record=ite.next().toString();
    int len=record.length();
    int i=2;
    if(0==len){
    continue;
    }
    char relationtype=record.charAt(0);
    if('1'==relationtype){
    buyer [buyernum]=record.substring(i);
    buyernum++;
    }
    if('2'==relationtype){
    friends[friendsnum]=record.substring(i);
    friendsnum++;
    }
    }
    if(0!=buyernum&&0!=friendsnum){
    for(int m=0;m<buyernum;m++){
    for(int n=0;n<friendsnum;n++){
    if(buyer[m]!=friends[n]){
    context.write(new Text(buyer[m]),new Text(friends[n]));
    }
    }
    }
    }
    }
    }
    public static void main(String[] args) throws Exception{

    Configuration conf=new Configuration();
    String[] otherArgs=new String[2];
    otherArgs[0]="hdfs://localhost:9000/mymapreduce7/in/buyer1";
    otherArgs[1]="hdfs://localhost:9000/mymapreduce7/out";
    Job job=new Job(conf," Table join");
    job.setJarByClass(DanJoin.class);
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true)?0:1);

    }
    }

8.在DanJoin類文件中，右鍵並點擊=>Run As=>Run on Hadoop選項，將MapReduce任務提交到Hadoop中。

9.待執行完畢後，進入命令模式下，在hdfs上從Java代碼指定的輸出路徑中查看實驗結果。

hadoop fs -ls /mymapreduce7/out  
hadoop fs -cat /mymapreduce7/out/part-r-00000

Mapreduce實例（六）：單表join

系統環境

相關知識

任務內容

任務步驟

Map代碼

Reduce代碼

完整代碼

數據結構思維導圖——緒論

計算機網絡實驗之交換機的管理配置

計算機網絡實驗之局域網的配置

數據結構思維導圖——線性表

計算機網絡實驗之DHCP實驗

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結