Hadoop學習筆記—5.自定義類型處理手機上網日誌


一、測試數據:手機上網日誌

1.1 關於這個日誌

  假設我們如下一個日誌文件,這個文件的內容是來自某個電信運營商的手機上網日誌,文件的內容已經經過了優化,格式比較規整,便於學習研究。

  該文件的內容如下(這裏我只截取了三行):

1363157993044 18211575961 94-71-AC-CD-E6-18:CMCC-EASY 120.196.100.99 iface.qiyi.com 視頻網站 15 12 1527 2106 200

1363157995033 15920133257 5C-0E-8B-C7-BA-20:CMCC 120.197.40.4 sug.so.360.cn 信息安全 20 20 3156 2936 200

1363157982040 13502468823 5C-0A-5B-6A-0B-D4:CMCC-EASY 120.196.100.99 y0.ifengimg.com 綜合門戶 57 102 7335 110349 200

  每一行不同的字段有有不同的含義,具體的含義如下圖所示:

1.2 要實現的目標

  有了上面的測試數據—手機上網日誌,那麼問題來了,如何通過map-reduce實現統計不同手機號用戶的上網流量信息?通過上表可知,第6~9個字段是關於流量的信息,也就是說我們需要爲每個用戶統計其upPackNum、downPackNum、upPayLoad以及downPayLoad這個四個字段的數量和,達到以下的顯示結果:

13480253104 3 3 180 180

13502468823 57 102 7335 110349

二、解決思路:封裝手機流量

2.1 Writable接口

  經過上一篇的學習,我們知道了在Hadoop中操作所有的數據類型都需要實現一個叫Writable的接口,實現了該接口才能夠支持序列化,才能方便地在Hadoop中進行讀取和寫入。

複製代碼
public interface Writable {
  /** 
   * Serialize the fields of this object to <code>out</code>.
   */
  void write(DataOutput out) throws IOException;

  /** 
   * Deserialize the fields of this object from <code>in</code>.  
   */
  void readFields(DataInput in) throws IOException;
}
複製代碼

  從上面的代碼中可以看到Writable 接口只有兩個方法的定義,一個是write 方法,一個是readFields 方法。前者是把對象的屬性序列化到DataOutput 中去,後者是從DataInput 把數據反序列化到對象的屬性中。(簡稱“讀進來”,“寫出去”)

  java 中的基本類型有char、byte、boolean、short、int、float、double 共7 中基本類型,除了char,都有對應的Writable 類型。但是,沒有我們需要的對應類型。於是,我們需要仿照現有的對應Writable 類型封裝一個自定義的數據類型,以供本次試驗使用。

2.2 封裝KpiWritable類型

  我們需要爲每個用戶統計其upPackNum、downPackNum、upPayLoad以及downPayLoad這個四個字段的數量和,而這個四個字段又都是long 類型,於是我們可以封裝以下代碼:

複製代碼
    /*
     * 自定義數據類型KpiWritable
     */
    public class KpiWritable implements Writable {

        long upPackNum;     // 上行數據包數,單位:個
        long downPackNum;    // 下行數據包數,單位:個
        long upPayLoad;     // 上行總流量,單位:byte
        long downPayLoad;    // 下行總流量,單位:byte

        public KpiWritable() {
        }

        public KpiWritable(String upPack, String downPack, String upPay,
                String downPay) {
            upPackNum = Long.parseLong(upPack);
            downPackNum = Long.parseLong(downPack);
            upPayLoad = Long.parseLong(upPay);
            downPayLoad = Long.parseLong(downPay);
        }

        @Override
        public String toString() {
            String result = upPackNum + "\t" + downPackNum + "\t" + upPayLoad
                    + "\t" + downPayLoad;
            return result;
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeLong(upPackNum);
            out.writeLong(downPackNum);
            out.writeLong(upPayLoad);
            out.writeLong(downPayLoad);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            upPackNum = in.readLong();
            downPackNum = in.readLong();
            upPayLoad = in.readLong();
            downPayLoad = in.readLong();
        }

    }
複製代碼

  通過實現Writable接口的兩個方法,就封裝好了KpiWritable類型。

三、編程實現:依然MapReduce

3.1 自定義Mapper類

複製代碼
    /*
     * 自定義Mapper類,重寫了map方法
     */
    public static class MyMapper extends
            Mapper<LongWritable, Text, Text, KpiWritable> {
        protected void map(
                LongWritable k1,
                Text v1,
                org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, KpiWritable>.Context context)
                throws IOException, InterruptedException {
            String[] spilted = v1.toString().split("\t");
            String msisdn = spilted[1]; // 獲取手機號碼
            Text k2 = new Text(msisdn); // 轉換爲Hadoop數據類型並作爲k2
            KpiWritable v2 = new KpiWritable(spilted[6], spilted[7],
                    spilted[8], spilted[9]);
            context.write(k2, v2);
        };
    }
複製代碼

  這裏將第6~9個字段的數據都封裝到KpiWritable類型中,並將手機號和KpiWritable作爲<k2,v2>傳入下一階段;

3.2 自定義Reducer類

複製代碼
    /*
     * 自定義Reducer類,重寫了reduce方法
     */
    public static class MyReducer extends
            Reducer<Text, KpiWritable, Text, KpiWritable> {
        protected void reduce(
                Text k2,
                java.lang.Iterable<KpiWritable> v2s,
                org.apache.hadoop.mapreduce.Reducer<Text, KpiWritable, Text, KpiWritable>.Context context)
                throws IOException, InterruptedException {
            long upPackNum = 0L;
            long downPackNum = 0L;
            long upPayLoad = 0L;
            long downPayLoad = 0L;
            for (KpiWritable kpiWritable : v2s) {
                upPackNum += kpiWritable.upPackNum;
                downPackNum += kpiWritable.downPackNum;
                upPayLoad += kpiWritable.upPayLoad;
                downPayLoad += kpiWritable.downPayLoad;
            }

            KpiWritable v3 = new KpiWritable(upPackNum + "", downPackNum + "",
                    upPayLoad + "", downPayLoad + "");
            context.write(k2, v3);
        };
    }
複製代碼

  這裏將Map階段每個手機號所對應的流量記錄都一一進行相加求和,最後生成一個新的KpiWritable類型對象與手機號作爲新的<k3,v3>返回;

3.3 完整代碼實現

  完整的代碼如下所示:

複製代碼
public class MyKpiJob extends Configured implements Tool {

    /*
     * 自定義數據類型KpiWritable
     */
    public static class KpiWritable implements Writable {

        long upPackNum; // 上行數據包數,單位:個
        long downPackNum; // 下行數據包數,單位:個
        long upPayLoad; // 上行總流量,單位:byte
        long downPayLoad; // 下行總流量,單位:byte

        public KpiWritable() {
        }

        public KpiWritable(String upPack, String downPack, String upPay,
                String downPay) {
            upPackNum = Long.parseLong(upPack);
            downPackNum = Long.parseLong(downPack);
            upPayLoad = Long.parseLong(upPay);
            downPayLoad = Long.parseLong(downPay);
        }

        @Override
        public String toString() {
            String result = upPackNum + "\t" + downPackNum + "\t" + upPayLoad
                    + "\t" + downPayLoad;
            return result;
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeLong(upPackNum);
            out.writeLong(downPackNum);
            out.writeLong(upPayLoad);
            out.writeLong(downPayLoad);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            upPackNum = in.readLong();
            downPackNum = in.readLong();
            upPayLoad = in.readLong();
            downPayLoad = in.readLong();
        }

    }

    /*
     * 自定義Mapper類,重寫了map方法
     */
    public static class MyMapper extends
            Mapper<LongWritable, Text, Text, KpiWritable> {
        protected void map(
                LongWritable k1,
                Text v1,
                org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, KpiWritable>.Context context)
                throws IOException, InterruptedException {
            String[] spilted = v1.toString().split("\t");
            String msisdn = spilted[1]; // 獲取手機號碼
            Text k2 = new Text(msisdn); // 轉換爲Hadoop數據類型並作爲k2
            KpiWritable v2 = new KpiWritable(spilted[6], spilted[7],
                    spilted[8], spilted[9]);
            context.write(k2, v2);
        };
    }

    /*
     * 自定義Reducer類,重寫了reduce方法
     */
    public static class MyReducer extends
            Reducer<Text, KpiWritable, Text, KpiWritable> {
        protected void reduce(
                Text k2,
                java.lang.Iterable<KpiWritable> v2s,
                org.apache.hadoop.mapreduce.Reducer<Text, KpiWritable, Text, KpiWritable>.Context context)
                throws IOException, InterruptedException {
            long upPackNum = 0L;
            long downPackNum = 0L;
            long upPayLoad = 0L;
            long downPayLoad = 0L;
            for (KpiWritable kpiWritable : v2s) {
                upPackNum += kpiWritable.upPackNum;
                downPackNum += kpiWritable.downPackNum;
                upPayLoad += kpiWritable.upPayLoad;
                downPayLoad += kpiWritable.downPayLoad;
            }

            KpiWritable v3 = new KpiWritable(upPackNum + "", downPackNum + "",
                    upPayLoad + "", downPayLoad + "");
            context.write(k2, v3);
        };
    }

    // 輸入文件目錄
    public static final String INPUT_PATH = "hdfs://hadoop-master:9000/testdir/input/HTTP_20130313143750.dat";
    // 輸出文件目錄
    public static final String OUTPUT_PATH = "hdfs://hadoop-master:9000/testdir/output/mobilelog";

    @Override
    public int run(String[] args) throws Exception {
        // 首先刪除輸出目錄已生成的文件
        FileSystem fs = FileSystem.get(new URI(INPUT_PATH), getConf());
        Path outPath = new Path(OUTPUT_PATH);
        if (fs.exists(outPath)) {
            fs.delete(outPath, true);
        }
        // 定義一個作業
        Job job = new Job(getConf(), "MyKpiJob");
        // 設置輸入目錄
        FileInputFormat.setInputPaths(job, new Path(INPUT_PATH));
        // 設置自定義Mapper類
        job.setMapperClass(MyMapper.class);
        // 指定<k2,v2>的類型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(KpiWritable.class);
        // 設置自定義Reducer類
        job.setReducerClass(MyReducer.class);
        // 指定<k3,v3>的類型
        job.setOutputKeyClass(Text.class);
        job.setOutputKeyClass(KpiWritable.class);
        // 設置輸出目錄
        FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
        // 提交作業
        Boolean res = job.waitForCompletion(true);
        if(res){
            System.out.println("Process success!");
            System.exit(0);
        }
        else{
            System.out.println("Process failed!");
            System.exit(1);
        }
        return 0;
    }

    public static void main(String[] args) {
        Configuration conf = new Configuration();
        try {
            int res = ToolRunner.run(conf, new MyKpiJob(), args);
            System.exit(res);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
複製代碼

3.4 調試運行效果

附件下載

  (1)本次用到的手機上網日誌(部分版):http://pan.baidu.com/s/1dDzqHWX

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章