可用typora打開此圖
按條件過濾
graph TD
A[Mysql<br>Hive]-->|sqlContext|B[actionRDD<br>JavaRDD<Row>]
B-->|mapToPair|C[session2ActionRDD<br>JavaPairRdd<String,Row>]
C-->|mapToPair|D[userid2PartAggrInfoRDD<br>JavaPairRDD<Long,String>]
A-->|sqlContext|E[userInfoRDD<br>JavaRDD<Row>]
E-->|mapToPair|F[userid2InfoRDD<br>JavaPairRdd<Long,Row>]
D-->|jion|G[useridFullInfoRDD<br>JavaPairRDD<Long,Tuple2<String,Row>>]
F-->|jion|G
G-->|mapToPair|H[sessionid2FullAggrInfoRDD<br>JavaPairRDD<String,String>]
H-->|filter|I[filteredSessionid2AggrInfoRDD<br>JavaPairRDD<String,String>]
按比例抽
graph TD
A[filteredSessionid2AggrInfoRDD<br>JavaPairRDD<String,String>]-->|jion&mapToPair|B[session2DetailRDD<br>JavaPairRDD<String,Row>]
C[sessionid2actionRDD<br>JavaPairRDD<String,Row>]-->|jion&mapToPair|B
A-->|mapToPair<br><yyyy-MM-dd_HH,aggrInfo>|D[time2SessionRDD<br>JavaPairRDD<String,String>]
D-->|countByKey|E[countMap<br>Map<String,Long>]
E-->F[dateHourCountMap<br>Map<yyyy-MM-dd,<HH,count>>]
F-->|Long:sessionId|H[dateHourExtractMap<br>Map<Strng,Map<String,List<Long>>>]
H-->|broadcast|I[dateHourExtractMapBroadcast<br>Map<Strng,Map<String,List<Long>>>]
B-->|groupByKey|J[time2SessionIdRDD<br>JavaPairRDD<String,Iterable<String>>]
J-->|flatMapToPair|K[extractSessoinIdsRDD<br>JavaPairRDD<String,String>]
I-->|braodcast.values|K
K-->|jion&sessionid2actionRDD|M[extractSessionDetailRDD<BR>JavaPairRDD<String,Tuple2<String,Row>>]
M-->|foreach.insert|L[MySql&Hive]
熱門品類
graph TD
A[sessionid2detailRDD<br>JavaPairRDD<String,Row>]-->|flatMapToPair|B[categoryidRDD<br>JavaPairRDD<Long,Long><br><categoryId,CategoryId>]
B-->|distinct|C[categoryIdsRDD]
A-->|filter|D[clickActionRDD<br>]
D-->|mapToPair|E[clickCategoryIdRDD<br>JavaPairRdd<Long,Long>]
E-->|reduceByKey&v1+v2|F[clickCategoryId2CountRDD<br>JavaPairRdd<Long,Long>]
C-->|leftOuterJion|H[tmpJionRDD<br>JavaPairRDD<Long,Tuple2<Long,Optional<Long>>]
F-->|leftOuterJion|H
H-->|mapToPair|I[tmpMapRDD<br>JavaPairRdd<Long,String><br><categoryId,clickCount>]
I-->|jion&orderCategoryIdCountRDD&payCategoryId2CountRDD|J[categoryid2countRDD<br>JavaPairRDD<Long,String>]
J-->|sortByKey|K[sortedCategoryCountRDD<br>JavaPairRDD<CategorySortKey,String>]
K-->|take|M[top10CategoryList<br>List<Tuple2<CategorySortKey,String>>]
熱品活躍Session
graph TD
A[top10CategoryList<br>List<Tuple2<CategorySortKey,String>>]-->|parallelizePairs|B[top10CategoryIdRDD<br>JavaPairRDD&Long,Long>]
C[sessionid2detailRDD<br>JavaPairRDD<String,Row>]-->|flatMapToPair|D[flatMapToPair<br>JavaPairRDD<Long,String><br><categoryid,sessionid&count>]
B-->|jion&mapToPair|E[top10CategorySessionCountRDD<br>JavaPairRDD<Long,String>]
D-->|jion&mapToPair|E
E-->|groupByKey|F[top10CategorySessionCountsRDD<br>JavaPairRDD<Long,Iterable<String>>]
F-->|flatMapToPair|J[top10SessionRDD<br>JavaPairRDD<String,String>]