[Spark程序] 之 單節點cache方案探討

最近在做一個分佈式任務時,遇到一個需求:

一個用戶member,可以有多個登陸ip,並把這些ip加入到可信ip集合中。可信集合上限100,當超過100時,對已經存在的100個進行LRU(last recent unused)替換。

 

方案設計思路:

使用spark streaming 來接收用戶登陸事件流,並把每個登陸ip寫入到HBase的表(limit_control)中。該表以member作爲key,ip作爲column family,然後具體的ip值作爲column,用戶登陸時間作爲timestamp.這樣可以完美去重。

但是怎樣進行100的上限控制呢?

初步的想法是,先把用戶member 通過groupByKey shuffle到不同的executor上,然後再對應的executor上使用對應部分key的cache進行ip數目統計和limit控制。

 

實現的核心代碼如下:

Streaming中處理代碼

stream.foreachRDD(rdd -> {
    try {
        // extract all activity events
        rdd.flatMap(record -> {
            String topic = record.topic();
            TopicHandler handler = HandlerFactory.getHandler(topic);
            return handler.handle(record.value());
        }).groupBy(act -> act.getMemberSrl())

            // process all activities
            .foreachPartition(itr -> {
                while (itr.hasNext()) {
                    Iterable<Activity> acts = itr.next()._2;
                    Processor.process(acts);
                }
            });

    } catch (Exception e) {
        log.error("consumer rdd error", e);
    }
});

processor中核心代碼:

String cacheKey = null;
try {
    Iterator<String> itr = trustMap.keySet().iterator();
    while (itr.hasNext()) {
        cacheKey = itr.next();

        // map entry => (trust value (uuid/pcid/ip value), last active time)
        Map<String, Long> trustSet = getCache().getTrustSet(cacheKey);

        // new trust set in current partition
        Map<String, Long> partiTrust = trustMap.get(cacheKey);

        String trustType = getCache().getTrustType(cacheKey);
        Integer limit = DurationLimit.getLimit(trustType);

        // up to limit, remove LRU item in trust set
        if (partiTrust.size() + trustSet.size() > limit) {
            List<Activity> partDelList = getDeleteList(partiTrust, cacheKey, limit);
            delList.addAll(partDelList);
        } else {
            // combine trust set, UPDATE CACHE TRUST SET
            trustSet.putAll(partiTrust);
        }
    }
} catch (Exception e) {
    log.error("get trust set from cache error, cacheKey: {}", cacheKey, e);
}

每個函數都單元測試完成,信心滿滿。但是現實很殘酷, 依然有很多數據超過100的限制,這是爲什麼呢?

 

問題原因:

通過對具體task(partition)到executor的路由關係查詢,發現index = 0 的partition 並不總是有executor = x 的節點來運行,這樣的路由分配是基於數據local特性來確定的,也就是隨機分配的。所以基於單點JVM 思路的cache方案最後的結果就是,有多個節點上存在cache導致cache不能有效控制limit。

舉個栗子:

第一次7個partition對應的任務 0-6, 分配到executorId: 2,6,1,4,3,7,5來執行。

第二次7個partition對應的任務 0-6, 分配到executorId:6,7,1,4,2,3,5來執行;

可以看出相同index的parition(task)並沒有被分配到相同的executor上來執行。
 

參考文章:

https://blog.csdn.net/adorechen/article/details/106317955

https://stackoverflow.com/questions/44414292/spark-is-it-possible-to-control-placement-of-partitions-to-nodes

In Spark, custom Partitioners can be supplied for RDD's. Normally, the produced partitions are randomly distributed to set of workers. For example if we have 20 partitions and 4 workers, each worker will (approximately) get 5 partitions. However the placement of partitions to workers (nodes) seems random like in the table below.

          trial 1    trial 2
worker 1: [10-14]    [15-19]
worker 2: [5-9]      [5-9]  
worker 3: [0-4]      [10-14]
worker 4: [15-19]    [0-4]  

This is fine for operations on a single RDD, but when you are using join() or cogroup() operations that span multiple RDD's, the communication between those nodes becomes a bottleneck. I would use the same partitioner for multiple RDDs and want to be sure they will end up on the same node so the subsequent join() would not be costly. Is it possible to control the placement of partitions to workers (nodes)?

          desired
worker 1: [0-4]
worker 2: [5-9]
worker 3: [10-14]
worker 4: [15-19]

 

結論

HashPartitioner 可以shuffle數據到固定index的partition中,但是固定index的partition不會一直路由到相同id的executor上。所以單節點cache方案在spark program是行不通的。

 

最終的解決方案:

在每個executor上寫數據,並把每個member對應的ip數目通過累加器返回到driver上,通過driver上的cache(只有一個節點,cache一定同步)來解決100的限制。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章