hadoop2.7.3清洗服務器訪問日誌之partitioner的學習和應用(六)

服務器的訪問日誌之所以需要清洗,原因有很多,通常日誌的格式,日誌的生成周期,用戶訪問的來源等等都是必要的原因.比如,本人就遇到由於服務器的訪問平臺不同,所以,需要把APP端,web端,h5端的訪問日誌歸爲3類,然後各自生成日誌文件.

這裏就用到了partitioner.

partitioner 根據定義,可以根據自定義的條件,把不同的key 分開,並可以生成不同的集合文件.默認的partitioner是HashPartitioner ,定義如下:

/** Partition keys by their {@link Object#hashCode()}. */
public class HashPartitioner<K, V> extends Partitioner<K, V> {
  /** Use {@link Object#hashCode()} to partition. */
  public int getPartition(K key, V value, int numReduceTasks) {
	return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }

}

默認的HashPartitioner類不能做到把總的訪問日誌分成3份的需求,因此,我們自定義這個類的實現.

假如以下是日誌文件:

183.136.190.40 - - [18/Mar/2017:03:56:58 +0800] "GET /mobile/api/handle.html HTTP/1.1" 200 574 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36"
183.39.91.88 - - [18/Mar/2017:11:06:25 +0800] "GET /pc/api/handle.html HTTP/1.1" 200 964 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
183.39.91.88 - - [18/Mar/2017:11:06:25 +0800] "GET /mobile/css/poposlides.css HTTP/1.1" 200 1855 "http://misbike.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
183.39.91.88 - - [18/Mar/2017:11:06:25 +0800] "GET /pc/js/jquery-1.8.3.min.js HTTP/1.1" 200 37522 "http://misbike.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
183.39.91.88 - - [18/Mar/2017:11:06:25 +0800] "GET /h5/js/poposlides.js HTTP/1.1" 200 1544 "http://misbike.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"

以上日誌每個平臺的接口都做了區分,APP端爲/mobile/ ,H5端爲 /h5/ ,PC端爲 /pc/
代碼如下:
public class Kpi_partitioner {
	
	/** 
	* @ClassName: RequestUrlMapper 
	* @Description: TODO 
	* @author [email protected]
	* @date 2017年7月12日
	*  
	*/
	public static class RequestUrlMapper extends Mapper<Object, Text, Text, KpiBean> {
		private KpiBean bean = new KpiBean();
		private Text word = new Text();

		public void map(Object key, Text value, Context context) throws IOException, InterruptedException {			
			if(value.toString().indexOf("\\")==-1){		
				//過濾不成功的請求
				bean = StringHandleUtils.filterLog(value.toString());

				if(bean.isValid()){					
					String[] fields = value.toString().split(" ");    			
	    			String request = fields[6];
	    			if(request != null && !"".equals(request)){
	    				word.set(request);	                           
		                Integer requestcount = 1;  //每次出現,次數爲1                
		                bean.setRequestCount(request, requestcount);
		                context.write(word, bean);
	    			}
	                
				}
			}
			
		}
	}

	
	/** 
	* @ClassName: RequestUrlReducer 
	* @Description: TODO 
	* @author [email protected]
	* @date 2017年7月12日
	*  
	*/
	public static class RequestUrlReducer extends Reducer<Text, KpiBean, Text, KpiBean> {
		private KpiBean bean = new KpiBean();

		public void reduce(Text key, Iterable<KpiBean> values, Context context)
				throws IOException, InterruptedException {
			int sum = 0;
			for (KpiBean val : values) {
				sum += val.getRequestcount();
			}
			bean.setRequestCount("", sum);
			context.write(key, bean);
		}
	}
	
	/** 
	* @ClassName: KpiPartitioner 
	* @Description: TODO 自定義partitioner
	* @author [email protected]
	* @date 2017年7月12日
	*  
	*/
	public static class KpiPartitioner extends Partitioner<Text, KpiBean> {
        @Override
        public int getPartition(Text key, KpiBean value, int numPartitions) {
            //
            String str = key.toString();
            if(str.indexOf("/mobile/")>-1)
            	return 0%numPartitions;
            if(str.indexOf("/pc/")>-1)
            	return 1%numPartitions;
            if(str.indexOf("/h5/")>-1)
            	return 2%numPartitions;
			return 1;
        }
    }
	
	public static void main(String[] args) throws Exception {		
		Configuration conf = new Configuration();
		
		Job job = new Job(conf, "request url partitioner");
		job.setJarByClass(Kpi_partitioner.class);
		job.setMapperClass(RequestUrlMapper.class);
		job.setCombinerClass(RequestUrlReducer.class);
		job.setReducerClass(RequestUrlReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(KpiBean.class);
		
		//設置reduce默認的Partitioner  
		job.setPartitionerClass(KpiPartitioner.class);
		//此處需要設置reduce的數量  
		job.setNumReduceTasks(3);
		
		FileInputFormat.addInputPath(job, new Path("hdfs://139.199.224.239:9000/user/hadoop/miqilog5Input"));
		FileOutputFormat.setOutputPath(job, new Path("hdfs://139.199.224.239:9000/user/hadoop/miqilog5Output"));
		
			
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}


效果:


發佈了61 篇原創文章 · 獲贊 42 · 訪問量 13萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章