java線程安全的高效計數--LongAddr原理分析

之前遇到一個問題，寫一個線程安全的高效計數。題目就這一句話，通過對這個問題的思考展開本篇。

一、初步想法

synchronized控制變量的修改

加鎖的方式會阻塞線程，線程需要被喚醒，這涉及到了線程的狀態的改變，需要上下文切換，所以是比較重量級的，-- 可以用但是低效。
volatile修飾計數變量

volatile只能保證多線程的內存可見性，不能保證多線程的執行有序性。而最基本的同步要保證有序性和可見性。-- 完全不可用。
Atomic* 系列的變量

涉及併發的地方都是使用CAS操作，使用sun.misc.Unsafe 提供的一系列底層 API，使得 Java 這樣的高級語言能夠直接和硬件層面的 CPU 指令打交道，在硬件層次上去做 compare and set操作。效率非常高。
Java8以後提供的LongAdder（繼承Striped64抽象類）

和Atomic* 系列一樣，涉及併發的地方都是使用CAS操作，使用sun.misc.Unsafe 提供的一系列底層 API，但是在多線程的情況下，它的效率更好，下面會從源碼分析一下。

我們主要比較一下後面兩種

二、寫測試

package com.su.demo;

import org.springframework.util.StopWatch;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.atomic.LongAdder;

/**
 * Author:   susq
 * Date:     2019-08-17 11:07
 */
public class CountApp {
    
    // 分別用10， 100， 1000
    private static ExecutorService executorService = Executors.newFixedThreadPool(100);

    public static void main(String[] args) {
        StopWatch stopWatch = new StopWatch("100個線程幾乎同時計數，每個線程計數100W次, 使用 LongAdder\n");
        stopWatch.start("使用 LongAdder");
        System.out.println(longAddrTest());
        stopWatch.stop();
        System.out.println(stopWatch.prettyPrint());
        while (!executorService.isShutdown()) {
            executorService.shutdown();
        }
    }

    private static long atomicAddTest() {
        AtomicLong num = new AtomicLong(0);
        List<CompletableFuture> completableFutureList = new ArrayList<>();
        CompletableFuture[] completableFutures = new CompletableFuture[]{};
        for (int i = 0; i < 100; i++) {
            completableFutureList.add(CompletableFuture.supplyAsync(() -> {
                for (int j = 0; j < 1000000; j++) {
                    num.incrementAndGet();
                }
                return new Object();
            }, executorService));
        }
        CompletableFuture.allOf(completableFutureList.toArray(completableFutures)).join();
        return num.get();
    }

    private static long longAddrTest() {
        LongAdder num = new LongAdder();
        List<CompletableFuture> completableFutureList = new ArrayList<>();
        CompletableFuture[] completableFutures = new CompletableFuture[]{};
        for (int i = 0; i < 100; i++) {
            completableFutureList.add(CompletableFuture.supplyAsync(() -> {
                for (int j = 0; j < 1000000; j++) {
                    num.increment();
                }
                return new Object();
            }, executorService));
        }
        CompletableFuture.allOf(completableFutureList.toArray(completableFutures)).join();
        return num.longValue();
    }
}

三、測試結果

1. 單線程執行

AtomicLong執行

1000000
StopWatch '1個線程幾乎同時計數，每個線程計數100W次, 使用 AtomicLong
': running time (millis) = 47
-----------------------------------------
ms     %     Task name
-----------------------------------------
00047  100%  使用 AtomicLong

LongAdder執行

1000000
StopWatch '1個線程幾乎同時計數，每個線程計數100W次, 使用 LongAdder
': running time (millis) = 51
-----------------------------------------
ms     %     Task name
-----------------------------------------
00051  100%  使用 LongAdder

2. 10個線程

AtomicLong執行

10000000
StopWatch '10個線程幾乎同時計數，每個線程計數100W次, 使用 AtomicLong
': running time (millis) = 251
-----------------------------------------
ms     %     Task name
-----------------------------------------
00251  100%  使用 AtomicLong

LongAdder執行

10000000
StopWatch '10個線程幾乎同時計數，每個線程計數100W次, 使用 LongAdder
': running time (millis) = 68
-----------------------------------------
ms     %     Task name
-----------------------------------------
00068  100%  使用 LongAdder

3. 100個線程

AtomicLong執行

100000000
StopWatch '100個線程幾乎同時計數，每個線程計數100W次, 使用AtomicLong
': running time (millis) = 2063
-----------------------------------------
ms     %     Task name
-----------------------------------------
02063  100%  使用AtomicLong

LongAdder執行

StopWatch '100個線程幾乎同時計數，每個線程計數100W次, 使用 LongAdder
': running time (millis) = 194
-----------------------------------------
ms     %     Task name
-----------------------------------------
00194  100%  使用 LongAdder

4. 1000個線程執行

AtomicLong執行

1000000000
StopWatch '1000個線程幾乎同時計數，每個線程計數100W次, 使用 AtomicLong
': running time (millis) = 19883
-----------------------------------------
ms     %     Task name
-----------------------------------------
19883  100%  使用 AtomicLong

LongAdder執行

StopWatch '1000個線程幾乎同時計數，每個線程計數100W次, 使用 LongAdder
': running time (millis) = 1065
-----------------------------------------
ms     %     Task name
-----------------------------------------
01065  100%  使用 LongAdder

結論:

看的出來，隨着併發線程數的增加，兩者的效率差距主鍵拉大。（這裏針對的是寫的效率，讀的效率後面說）

四、爲什麼

LongAdder 繼承 Striped64 繼承 Number，它在做運算的時候，先嚐試cas更新，如果成功則與AtomicLong流程相似，如果失敗，就不同了。它不會死循環不斷地進項cas直到成功，而是將線程分散到不同的區域，減輕線程數量太多造成的大量失敗，相當於分散唯一的計數值得熱度。這個區域就是Cell 數組
Cell是Striped64 的內部類，包裝了long value, 內部使用cas操作。增加了註解@sun.misc.Contended避免僞共享。緩存系統中是以緩存行（cache line）爲單位存儲的，當多線程修改互相獨立的變量時，如果這些變量共享同一個緩存行，就會無意中影響彼此的性能，這就是僞共享。java8的這個註解避免僞共享的原理是在value前後各增加128字節大小的padding，使用2倍於大多數硬件緩存行的大小來避免相鄰扇區預取導致的僞共享衝突。

@sun.misc.Contended static final class Cell {
    volatile long value;
    Cell(long x) { value = x; }
    final boolean cas(long cmp, long val) {
        return UNSAFE.compareAndSwapLong(this, valueOffset, cmp, val);
    }

    // Unsafe mechanics
    private static final sun.misc.Unsafe UNSAFE;
    private static final long valueOffset;
    static {
        try {
            UNSAFE = sun.misc.Unsafe.getUnsafe();
            Class<?> ak = Cell.class;
            valueOffset = UNSAFE.objectFieldOffset
                (ak.getDeclaredField("value"));
        } catch (Exception e) {
            throw new Error(e);
        }
    }
}

核心方法就是add 方法和 longAccumulate方法，先使用cas, 如果失敗，看看能不能分散到Cell 數組上去執行cas, 如果數組還沒初始化，或者初始化了但是在定位後的位置cas操作失敗了，則進入longAccumulate方法。

public class LongAdder extends Striped64 implements Serializable {
  
  public void increment() {
    add(1L);
  }

  public void add(long x) {
    Cell[] as; long b, v; int m; Cell a;
    /* 第一次調用的時候cells數組肯定爲null
     * final boolean casBase(long cmp, long val) {
     *   return UNSAFE.compareAndSwapLong(this, BASE, cmp, val);
     * }
     * 後面使用cas操作直接修改值，如果成功的時候，直接返回，就不用那			 
     * 麼複雜了，但是多線程的時候，cas操作是經常會失敗的，線程越多失敗越頻繁，
     * 這也是AtomicLong爲什麼在高併發時候效率降低的原因
     */
    if ((as = cells) != null || !casBase(b = base, b + x)) {
      boolean uncontended = true; 
      /* 如果 cells 沒初始化，或者初始化了但是在cells數組的經過哈希得到的位置
       * 處還沒有值，或者在哈希後的位置處有值，但是對這個位置進項cas更新成功過了，就返回了
       */
      if (as == null || (m = as.length - 1) < 0 ||
          (a = as[getProbe() & m]) == null ||
          !(uncontended = a.cas(v = a.value, v + x)))
        longAccumulate(x, null, uncontended);
    }
  }
  
  final void longAccumulate(long x, LongBinaryOperator fn,
                              boolean wasUncontended) {
    int h;
    /* Probe 線程類中threadLocalRandomProbe屬性的偏移量, 靜態方法中初始化好了,
     * getProbe()就是取線程中threadLocalRandomProbe屬性的值，只要線程不同，這個值就不同，
     * 可以很好的作爲哈希值
     */ 
    if ((h = getProbe()) == 0) {  
      ThreadLocalRandom.current(); // force initialization
      h = getProbe();
      wasUncontended = true;
    }
    boolean collide = false;                // True if last slot nonempty
    for (;;) {
      Cell[] as; Cell a; int n; long v;
      // 如果cells 數組已經初始化
      if ((as = cells) != null && (n = as.length) > 0) {
        // 如果 線程哈希值 和 數組長度取餘（使用&操作取餘效率更快）後的位置還沒初始化值
        if ((a = as[(n - 1) & h]) == null) {
          // 判斷鎖標誌cellsBusy，如果沒有鎖（==0表示沒有鎖），就創建一個新的值用Cell包裝，
          if (cellsBusy == 0) {       
            Cell r = new Cell(x);   
            // 在此檢查沒有鎖，並且開始cas加鎖，加鎖成功後，將創建好的cell保存到數組中
            if (cellsBusy == 0 && casCellsBusy()) {
              boolean created = false;
              try {               // Recheck under lock
                Cell[] rs; int m, j;
                if ((rs = cells) != null &&
                    (m = rs.length) > 0 &&
                    rs[j = (m - 1) & h] == null) {
                  rs[j] = r;
                  created = true;
                }
              // 保存結束後，無論成功失敗，鎖先放開
              } finally {
                cellsBusy = 0;
              }
              // 如果剛纔保存成功了，created標誌就是true, 可以結束了，
              // 否則continue 重來整個循環裏的操作
              if (created)
                break;
              continue;           // Slot is now non-empty
            }
          }
          collide = false;
        }
        /* 如果哈希衝突標誌wasUncontended已經爲true, 說明已經沒有哈希衝突了。
         * 如果爲false,說明getProbe()返回的哈希值已經存在了但不是通過上面的強制
         * 初始化線程得到的，通過該哈希值也找到數組對應的位置已經有值，但可能有衝突，
         * 直接走最下面的方法h = advanceProbe(h); 再哈希更新h的值之後繼續循環
         */ 
        else if (!wasUncontended)       // CAS already known to fail
          wasUncontended = true;      // Continue after rehash
        // 這裏是哈希值是我們自己算過的了，定位到的元素也存在了，則嘗試cas，成功了就退出
        else if (a.cas(v = a.value, ((fn == null) ? v + x :
                                     fn.applyAsLong(v, x))))
          break;
        // 判斷數組的長度已經達到了cpu核心數，collide 置爲false，只要達到了，
        // 每次循環執行到下一個else if就會短路擴容操作，免得繼續增大數組
        else if (n >= NCPU || cells != as)
          collide = false;            // At max size or stale
        // 判斷數組的長度已經沒有達到了cpu核心數，collide 置爲 true
        else if (!collide)
          collide = true;
        // 如果cells數組容量還沒有達到限制，並且分散後到指定cell上的cas也沒成功，
        // 則擴容，左移增大一倍。continue 繼續下次循環
        else if (cellsBusy == 0 && casCellsBusy()) {
          try {
            if (cells == as) {      // Expand table unless stale
              Cell[] rs = new Cell[n << 1];
              for (int i = 0; i < n; ++i)
                rs[i] = as[i];
              cells = rs;
            }
          } finally {
            cellsBusy = 0;
          }
          collide = false;
          continue;                   // Retry with expanded table
        }
        h = advanceProbe(h);
      }
      // 如果cells沒有初始化，則嘗試獲取鎖cellBusy，獲取成功後初始化cells數
      // 組大小爲2，保存當前線程的值
      else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
        boolean init = false;
        try {                           // Initialize table
          if (cells == as) {
            Cell[] rs = new Cell[2];
            rs[h & 1] = new Cell(x);
            cells = rs;
            init = true;
          }
        } finally {
          cellsBusy = 0;
        }
        if (init)
          break;
      }
      // 如果cells數組也初始化失敗了，就嘗試cas將結果保存到base字段上，
      // 保存成功就結束，不成功繼續下次循環
      else if (casBase(v = base, ((fn == null) ? v + x :
                                  fn.applyAsLong(v, x))))
        break;                          // Fall back on using base
    }
  }
}

再來看讀取方法，上面保存的時候，我們既將有些線程的值分擔到各個cell去保存，但是也有部分累計到了base上面。可以看到獲取的時候，是把base和所有cell裏面的值一起累計起來返回的。所以讀取的時候，雖然沒有各類加鎖的操作，但是卻需要累加，是要比AtomicLong 慢一點點的。

public long longValue() {
  return sum();
}

public long sum() {
  Cell[] as = cells; Cell a;
  long sum = base;
  if (as != null) {
    for (int i = 0; i < as.length; ++i) {
      if ((a = as[i]) != null)
        sum += a.value;
    }
  }
  return sum;
}

五、各自的用處

很顯然，對於多線程環境下頻繁的更新計數操作，LongAdder 是最佳選擇。但是讀性能由於組合求值的原因，不如AtomicLong
AtomicLong 提供的計數方法可以直接返回計算後的值，免去了再次讀取的操作，對有些場景來說更方便。

java線程安全的高效計數--LongAddr原理分析

一、初步想法

二、寫測試

三、測試結果

1. 單線程執行

AtomicLong執行

LongAdder執行

2. 10個線程

AtomicLong執行

LongAdder執行

3. 100個線程

AtomicLong執行

LongAdder執行

4. 1000個線程執行

AtomicLong執行

LongAdder執行

結論:

四、爲什麼

五、各自的用處

RabbitMQ文檔翻譯一(JAVA).Hello World!

RabbitMQ文檔翻譯六(JAVA).遠程過程調用（RPC）

RabbitMQ文檔翻譯四(JAVA).路由

RabbitMQ文檔翻譯五(JAVA).主題

RabbitMQ文檔翻譯三(JAVA).發佈/訂閱

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

java線程安全的高效計數--LongAddr原理分析

一 、初步想法

二 、寫測試

三 、 測試結果

1. 單線程執行

AtomicLong執行

LongAdder執行

2. 10個線程

AtomicLong執行

LongAdder執行

3. 100個線程

AtomicLong執行

LongAdder執行

4. 1000個線程執行

AtomicLong執行

LongAdder執行

結論:

四、爲什麼

五、各自的用處

一、初步想法

二、寫測試

三、測試結果