spark - 數據傾斜

原創

良人與我

2019-02-09 13:41

spark 是否會產生數據傾斜？
會的。比如單詞統計，如果某個單詞的量非常之巨大，聚合到同一個節點的時候它的數據就會非常大。這樣就會發生數據傾斜。

解決辦法
可以爲單詞拼接後綴 _x（x 爲隨機數）
這樣混洗的時候即使是同一個單詞也會因爲不同後綴的緣故分配到不同節點。

代碼實現如下：

JavaPairRDD<String,Integer> rdd1 = sc.textFile(filePaht)
                .flatMap( s -> Arrays.asList(s.split(" ")).iterator())
                .filter(t->StringUtils.isNoneBlank(t))
                .mapToPair(s -> new Tuple2<>(s+"_"+RandomUtils.nextInt(0,100), 1))
                .reduceByKey((v1,v2)-> (v1+v2))
                .mapToPair(t-> new Tuple2<>(t._1.substring(0,t._1.indexOf('_')), t._2))
                .reduceByKey((v1,v2)-> (v1+v2));
        rdd1.collect().forEach(t-> System.out.println(t + " " + Thread.currentThread()));

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

spark - 數據傾斜

spring web項目加載過程

記錄下將查詢的結果存入表中的sql

jvm 參數不起作用

hive configuration hive.enforce.bucketing does not exists

websocket frpc 內網穿透

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結