線上日誌格式如下
每次訪問都會記錄ip,標記紅色部分
獲取某一行看看怎麼解析
2017-12-01 09:57:11.970 [http-nio-8082-exec-2] INFO - com.fullshare.common.aop.ControllerAop [ 144] - 請求head:{content-type=application/json, platform=ios, requestsign=f8ea2ff2af562ac5665ada231317a66b, accept-language=zh-Hans-CN;q=1, en-GB;q=0.9, host=tapi.fshtop.com, x-forwarded-for=192.168.132.167, accept=application/json, appid=123456, appversion=2.5, user-agent=FullShareTop/2.5 (iPhone; iOS 10.3.2; Scale/2.00), authorization=072a2431f2bd6cf8108eb3231488cb6dfcc6e11eead3d04283f67762313b2259b937d07358140ef1acf6c6963f8ad42bb088f3223638244e, osversion=10.3.2, mode=iPhone7,2, deviceid=88793C63-994E-44FF-A8BA-506B3897C963, clienttime=1512093518629, content-length=67, brand=iphone, channel=appstore, idfa=CC2E3934-6C3E-4E64-9894-02603E7CED3A}
可以寫代碼了
那些安裝hadoop ,spark我不說了網上有,jar包引入在我另一篇文章
代碼如下
package test.spark;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
/**
*
* @author huangjiangnan
*
*/
public class FilterLine {
@SuppressWarnings("resource")
public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("spark://192.168.7.202:7077").setAppName(FilterLine.class.getName());
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> inputRDD=sc.textFile("hdfs://192.168.7.202:900/test/nohup.out");
//java lambda表達式 jdk8以上,省很多代碼
//轉化RDD,過濾,只需要想要的行
JavaRDD<String> reqRDD=inputRDD.filter((String x)->{
if(x.contains("請求head")){
return true;
}
return false;
});
//JavaPairRDD 建值對
JavaPairRDD<String, Integer> pairRDD=reqRDD.mapToPair((String x)->{
String[] ss=x.split(",");
String ip="未知ip";
for (String st : ss) {
if(st.contains("x-forwarded-for")){
String[] ipStr=st.split("=");
if(ipStr.length>1){
ip=ipStr[1];
break;
}
}
}
return new Tuple2<String,Integer>(ip,1);
}).reduceByKey((Integer num1,Integer num2)->{
return num1+num2;
});
pairRDD.saveAsTextFile("hdfs://192.168.7.202:900/test/FilterLine-spark");
}
}
打包然後提交執行
最後結果如下