Java8 stream 中利用 groupingBy 進行多字段分組求和

Java8的groupingBy實現集合的分組,類似Mysql的group by分組功能,注意得到的是一個map

 

對集合按照單個屬性分組、分組計數、排序

List<String> items =
        Arrays.asList("apple", "apple", "banana",
                "apple", "orange", "banana", "papaya");

// 分組
Map<String, List<String>> result1 = items.stream().collect(
        Collectors.groupingBy(
                Function.identity()
        )
);
//{papaya=[papaya], orange=[orange], banana=[banana, banana], apple=[apple, apple, apple]}
System.out.println(result1);


// 分組計數
Map<String, Long> result2 = items.stream().collect(
        Collectors.groupingBy(
                Function.identity(), Collectors.counting()
        )
);
// {papaya=1, orange=1, banana=2, apple=3}
System.out.println(result2);
Map<String, Long> finalMap = new LinkedHashMap<>();


//分組, 計數和排序
result2.entrySet().stream()
        .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
        .forEachOrdered(e -> finalMap.put(e.getKey(), e.getValue()));
// {apple=3, banana=2, papaya=1, orange=1}
System.out.println(finalMap);

 

集合按照多個屬性分組

1.多個屬性拼接出一個組合屬性

public static void main(String[] args) {
    User user1 = new User("zhangsan", "beijing", 10);
    User user2 = new User("zhangsan", "beijing", 20);
    User user3 = new User("lisi", "shanghai", 30);
    List<User> list = new ArrayList<User>();
    list.add(user1);
    list.add(user2);
    list.add(user3);
    Map<String, List<User>> collect = list.stream().collect(Collectors.groupingBy(e -> fetchGroupKey(e)));
    //{zhangsan#beijing=[User{age=10, name='zhangsan', address='beijing'}, User{age=20, name='zhangsan', address='beijing'}], 
    // lisi#shanghai=[User{age=30, name='lisi', address='shanghai'}]}
    System.out.println(collect);
}


private static String fetchGroupKey(User user){
    return user.getName() +"#"+ user.getAddress();
}

2.嵌套調用groupBy

User user1 = new User("zhangsan", "beijing", 10);
User user2 = new User("zhangsan", "beijing", 20);
User user3 = new User("lisi", "shanghai", 30);
List<User> list = new ArrayList<User>();
list.add(user1);
list.add(user2);
list.add(user3);
Map<String, Map<String, List<User>>> collect
        = list.stream().collect(
                Collectors.groupingBy(
                        User::getAddress, Collectors.groupingBy(User::getName)
                )
);
System.out.println(collect);

3. 使用Arrays.asList

我有一個與Web訪問記錄相關的域對象列表。這些域對象可以擴展到數千個。
我沒有資源或需求將它們以原始格式存儲在數據庫中,因此我希望預先計算聚合並將聚合的數據放在數據庫中。
我需要聚合在5分鐘窗口中傳輸的總字節數,如下面的sql查詢

select 
  round(request_timestamp, '5') as window, --round timestamp to the nearest 5 minute
  cdn, 
  isp, 
  http_result_code, 
  transaction_time, 
  sum(bytes_transferred)
from web_records
group by 
    round(request_timestamp, '5'), 
    cdn, 
    isp, 
    http_result_code, 
    transaction_time


在java 8中,我當前的第一次嘗試是這樣的,我知道這個解決方案類似於Group by multiple field names in java 8

Map<Date, Map<String, Map<String, Map<String, Map<String, Integer>>>>>>> aggregatedData =
webRecords
    .stream()
    .collect(Collectors.groupingBy(WebRecord::getFiveMinuteWindow,
               Collectors.groupingBy(WebRecord::getCdn,
                 Collectors.groupingBy(WebRecord::getIsp,
                   Collectors.groupingBy(WebRecord::getResultCode,
                       Collectors.groupingBy(WebRecord::getTxnTime,
                         Collectors.reducing(0,
                                             WebRecord::getReqBytes(),
                                             Integer::sum)))))));


這是可行的,但它是醜陋的,所有這些嵌套的地圖是一個噩夢!要將地圖“展平”或“展開”成行,我必須這樣做

for (Date window : aggregatedData.keySet()) {
  for (String cdn : aggregatedData.get(window).keySet()) {
    for (String isp : aggregatedData.get(window).get(cdn).keySet()) {
      for (String resultCode : aggregatedData.get(window).get(cdn).get(isp).keySet()) {
        for (String txnTime : aggregatedData.get(window).get(cdn).get(isp).get(resultCode).keySet()) {

           Integer bytesTransferred = aggregatedData.get(window).get(cdn).get(distId).get(isp).get(resultCode).get(txnTime);
           AggregatedRow row = new AggregatedRow(window, cdn, distId...


如你所見,這是相當混亂和難以維持。
有誰知道更好的方法嗎?任何幫助都將不勝感激。
我想知道是否有更好的方法來展開嵌套的映射,或者是否有一個庫允許您對集合進行分組。

 

最佳答案

您應該爲地圖創建自定義密鑰。最簡單的方法是使用Arrays.asList

Function<WebRecord, List<Object>> keyExtractor = wr ->
    Arrays.<Object>asList(wr.getFiveMinuteWindow(), wr.getCdn(), wr.getIsp(),
             wr.getResultCode(), wr.getTxnTime());

Map<List<Object>, Integer> aggregatedData = webRecords.stream().collect(
      Collectors.groupingBy(keyExtractor, Collectors.summingInt(WebRecord::getReqBytes)));


在這種情況下,鍵是按固定順序列出的5個元素。不是很面向對象,但很簡單。或者,您可以定義自己的表示自定義鍵的類型,並創建適當的hashCode/equals實現。

參考鏈接:

 

發佈了410 篇原創文章 · 獲贊 1345 · 訪問量 208萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章