一篇文章學會 Java 8 新特性 —— Stream 流

文章目錄

Java 8 的新特性 —— Stream 流。

先看這樣一個問題：

定義一個數組或者集合(source)，包含 0 ~ 9 十個數字，篩選大於 5 的數組，返回一個新的數組或者集合(result)

當你熟練的打開編輯器，噼裏啪啦一頓操作猛如虎：

List<Integer> result = new ArrayList<>();
for(Integer i : source){
	if(i > 5)
		result.add(i);
}
return result;

小小嘚瑟一下，簡單，五六行代碼，十秒鐘搞定。

回頭看看左邊，python 程序員一臉嫌棄的看過來，偌大的屏幕上，函數中只有孤零零的一行代碼

np.where(source > 5)

What ？？？這就可以了？

仔細一想，也難怪，Python一向以語法簡單著稱，可這，還是有點太簡單了……鬱悶中 ╮(╯﹏╰）╭

回頭看一眼右邊，目光正好碰上 C# 程序員鄙視的眼神投來，頓時有點忐忑，難道他也……，弱弱的瞄一眼屏幕，也是一行！！！瞬間 …（⊙＿⊙）…

source.Where(t => t > 5);

當他告訴你，他還有一種語法也可以一行實現該功能

from i in source where i > 5 select i;

天哪，有木有一種要崩潰的感覺 ~

這… 這到底是是什麼語法？ SQL 的風格直接一行代碼處理數據，一分鐘之內連續被兩個同行鄙視，Java ，你怎可如此 low？

事實上，你還真錯怪 Java 了

它 low 嗎？不，一點也不！
它沒有類似的語法嗎？不，只是你不會而已！ 兄弟，該學習了！！！

Java 8 的新特性 —— Stream 流。

上面的問題，我們同樣可以避免冗餘的 for 循環，乏味的 if 判斷，利用一行代碼輕鬆解決。

source.stream().filter( i -> i > 5).collect(toList());

在 Java 8 中，數據集合只需要調用 stream 方法轉換爲 stream流，就可以輕鬆使用各種 聲明性方式 處理數據集合。

何爲聲明性方式？

**聲明性方式：**即只需要說明想要完成的動作，而不需要關心實現動作的具體操作。如上例，我們只需要說明我們的需求（過濾數組，以 i > 5 爲條件），而不需要具體的操作（遍歷數組，如果 i > 5，添加到新數組 …）

流的簡介和特點

流其實就是從一組數據源中生成的元素序列。流與集合最大的不同在於流的作用是來表達計算。簡單來說，通常情況下集合是用來存儲數據的，流是用來處理數據的。

鏈式操作

大多數流操作的返回值依然是流，所以這樣的多個操作就可以鏈接起來，形成一個流水線式操作，也稱之爲鏈式操作。
順序保留

一個數據源轉換爲流時，如果數據源是有序的，那麼生成流的時候也會保留原有順序。流在概念上是固定的數據結構，因此不能進行增刪等操作。
數據處理

數據處理是流的核心功能，不僅支持順序操作，還支持並行操作。這種類 SQL 的聲明式操作，極大的方便了集合數據的處理。
內部迭代

流和迭代和迭代器的顯式迭代類似，都只能遍歷一次，但是流的迭代操作是在背後進行的，這使得流操作相對於顯式迭代操作來說，安全性和性能都有了很大的提升。

流的基本操作

流的基本操作主要分三個步驟：初始操作、中間操作、終端操作。

初始操作
- stream
  - 方法一：Collection 提供的 stream 方法，將數據源轉換成順序流返回
```
List<String> list = new ArrayList<>();
Stream<String> stream = list.stream();
```
  - 方法二：Collection 提供的 parallelStream 方法，將數據源轉換成並行流返回
```
List<String> list = new ArrayList<>();
Stream<String> parallelStream = list.parallelStream();
```
  - 方法三：數組創建流 —— Arrays 的 stream 靜態方法獲取數據流
```
int[] array = new int[] {};
IntStream intStream = Arrays.stream(array);
IntStream intStream1 = Arrays.stream(array, 0, array.length);

/* 與 Java 8 提供的函數式接口同樣，爲了避免頻繁的拆裝箱引起不必要的開銷，Stream 也擴展了原始類型的流：如 IntStream， DoubleStream 等 */
```
  - 方法四：值創建流 —— Stream 的 of 靜態方法獲取數據流
```
Stream stream = Stream.of("hello", "every one", "good", "morning");
```
  - 方法五：文件創建流 —— Files 的 lines 方法
```
Stream<String> lines = Files.lines(Paths.get("123.txt"), Charset.defaultCharset());
```
  - 方法六：空流 —— Stream 的 empty 方法
```
Stream<Object> empty = Stream.empty();
```
  - 方法七：函數創建流（無限流）
    1. 迭代函數
      
      該流接收一個初始值和一個依次應用在每個新值上的 UnaryOperator 類型的 Lambda 表達式，這種流沒有結尾，可以永遠計算下去，因此需要 limit 截斷
```
Stream<Integer> stream = Stream.iterate(0, n -> n + 2).limit(10);
```
    2. 生成函數
      與迭代函數不同，生成函數不是對每個新生成的值應用函數的，它接收一個 Supplier 類型的 Lambda 表達式來提供新值，但同樣它也是無限的，因此需要截斷
```
Stream<Double> stream = Stream.generate(Math::random).limit(10);
```

中間操作

filter

作用：過濾
參數類型： Predicate<T>
返回類型： Stream<T>

示例：

List<Integer> nums = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 0);
nums.stream().filter(t-> t % 2 == 0)
//nums.stream().filter(t-> t % 2 == 0).forEach(System.out::println); //forEach 爲了將結果打印到控制檯上
//結果：2 4 6 8 0

map

作用：映射
參數類型： Function<T, R>
返回類型： Stream<R>

示例：

List<Integer> nums = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 0);
nums.stream().map(t -> t * 2);
//結果：2 4 6 8 10 12 14 16 18 0

sorted

作用：排序
參數類型： Comparator<T>
返回類型： Stream<T>

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
nums.stream().sorted();//默認排序	4  5  7  7  7  8  9  10  11  12
nums.stream().sorted(Comparator.reverseOrder());// 逆序排序  12  11  10  9  8  7  7  7  5  4
nums.stream().sorted(Comparator.comparing(i -> i % 5));  //自定義排序（除以 5 的餘數大小排序）  10  5  11  12  7  7  7  8  9  4

distinct

作用：去重
返回類型： Stream<T>

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
nums.stream().distinct();  // 10  12  9  5  7  4  8  11

limit

作用：截取
參數類型： long
返回類型： Stream<T>

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
nums.stream().limit(3);  // 10  12  9

skip

作用：跳過
參數類型： long
返回類型： Stream<T>

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
nums.stream().skip(3);  // 5  7  4  7  8  11  7

flatMap

作用：扁平化處理（一個流中的每個值都需要換成另一個流時，將所有流鏈接成一個流）
參數類型： Function<T, Stream<R>>
返回類型： Stream<R>

示例：

List<Integer> num1 = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9);
List<Integer> num2 = Arrays.asList(3, 4, 5);
Stream <int[]> stream = num1.stream().flatMap(i -> 
							num2.stream().filter(j -> i != j && i % j == 0).map(j -> new int[]{i, j}));

List<int[]> list = stream.collect(Collectors.toList());
//list:  [[6, 3], [8, 4], [9, 3]]

終端操作

collect

作用：規約
擴展：規約操作還可以自定義收集器，實現更爲複雜的規約，這裏暫用預定義的收集器來寫兩個最常用的示例，
參數類型： Collector<T, A, R>
返回類型： R

示例：

規約成集合

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
List<Integer> result = nums.stream().skip(3).collect(Collectors.toList());

規約成分組

private List<User> source = new ArrayList<>();
source.add(new User("張", "三", "male", 18));
source.add(new User("李", "四", "female", 14));
source.add(new User("王", "五", "female", 24));
source.add(new User("趙", "六", "male", 16));

Map<String, List<User>> groupByGender = source.stream().collect(groupingBy(User::getGender));
/* 根據性別分組，返回一個 Map 類型集合 */
/* 結果類似如下結構： {"male" : ["張三", "趙六"], "female" : ["李四", "王五"]} */

/* 如果需要多級分組，groupingBy 還可以傳遞第二個參數，爲內層 groupingBy 
   語法： groupingBy(一級分組表達式, groupingBy(二級分組表達式, groupingBy(三級分組表達式))) */

count

作用：返回流中元素的個數
返回類型： long.

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
long count = nums.stream().count();

forEach

作用：使用對應的表達式消費流中的每個元素
參數類型： Consumer<T>
返回類型： void

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
nums.stream().skip(3).forEach(System.out::print);

reduce

作用：規約
參數類型： BinaryOperator<T>
返回類型： Optional<T>

示例：

reduce 中 Lambda 表達式是一步一步將上一次計算的結果返回，然後繼續和下一個元素運算，併產生一個新的結果返回，繼續和下一個元素運算，直到流結束。

無初始值

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
Optional<Integer> sum = nums.stream().reduce((a, b) -> a + b);
//無初始值，默認 0
// 0 + 10   返回  10
// 上一次返回結果和流的下一個元素傳入 Lambda 表達式 ： 10 + 12  返回  22
// 上一次返回結果和流的下一個元素傳入 Lambda 表達式 ： 22 + 9   返回  31
// ......
// sum: 80

有初始值

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
Integer sum = nums.stream().reduce(10, Integer::sum); // Integer::sum 在這裏與 (a, b) -> a + b 等價
//無初始值，默認 10
// 10 + 10   返回  20
// 上一次返回結果和流的下一個元素傳入 Lambda 表達式 ： 20 + 12  返回  32
// 上一次返回結果和流的下一個元素傳入 Lambda 表達式 ： 32 + 9   返回  41
// ......
// sum: 90

allMatch

作用：檢查謂詞是否匹配所有元素
參數類型： Predicate<T>
返回類型： boolean

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
boolean isAllOdd = nums.stream().allMatch(i -> i % 2 == 1);
// false

anyMatch

作用：檢查謂詞是否至少匹配一個元素
參數類型： Predicate<T>
返回類型： boolean

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
boolean hasOdd = nums.stream().anyMatch(i -> i % 2 == 1);
// true

noneMatch

作用：檢查謂詞是否不匹配所有元素(與allMatch相對)
參數類型： Predicate<T>
返回類型： boolean

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
boolean isAllEven = nums.stream().noneMatch(i -> i % 2 == 1);
// true

findAny

作用：返回流中任意元素，將利用短路找到結果後立即結束
返回類型： Optional<T>

示例：

List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
Optional<Integer> firstOdd = nums.stream().filter(i -> i % 2 == 1).findAny();
// 9

findFirst
- 作用：返回流中第一個元素
- 返回類型： Optional<T>
- 示例：
```
List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
Optional<Integer> firstOdd = nums.stream().filter(i -> i % 2 == 1).findFirst();
// 9
```
  一眼望去，findAny 和 findFirst 似乎沒有什麼區別。
  
  實際上，findAny 其返回的結果是不確定的。如上例，如果是並行情況，或者數據量比較大的時候，多調用幾次，可能返回 9，也可能返回 5、 7 或者 11。
  
  而 findFirst 返回的一定是第一個元素。
  
  所以，如果側重於效率，只需要一個滿足條件的任意結果，那麼建議使用 finaAny ，因爲並行搜索效率更高；
  
  如果側重於元素，必須要滿足條件的第一個元素，那麼就要使用 findFirst。
- 再看兩個例子：
```
List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
while (true)
	System.out.println(nums.parallelStream().filter(i -> i % 2 == 1).findFirst().get());

//  輸出結果一直爲 9 
```
```
List<Integer> nums = Arrays.asList(10, 12, 9, 5, 7, 4, 7, 8, 11, 7);
while (true)
	System.out.println(nums.parallelStream().filter(i -> i % 2 == 1).findAny().get());

//  輸出結果不唯一。7  9  5  11 
```

最後再看一組完整的基本使用示例代碼：

public class Test {
	//先初始化一組數據
    private List<User> source = new ArrayList<>();
    {
        source.add(new User("張", "三", "male", 18));
        source.add(new User("李", "四", "female", 14));
        source.add(new User("王", "五", "female", 24));
        source.add(new User("趙", "六", "male", 16));
    }

	/* 流的基本三步驟： 創建、中間操作、終端操作 */
    @Test
    public void test1() {
		// 初始操作
		Stream<User> stream =  source.stream();
		// 中間操作
		Stream<User> men = stream.filter(t -> "male".equals(t.getGender()));
        Stream<String> menNamesStream = men.map(t -> t.getFirstName() + t.getLastName());
		//終端操作
        List<String> menNames = menNamesStream.collect(Collectors.toList());// ["張三", "趙六"]

		/**  流只能被消費一次， 如果再次調用將會異常  **/
		List<User> women = stream.filter(t -> "female".equals(t.getGender())).collect(Collectors.toList());
		/**  java.lang.IllegalStateException: stream has already been operated upon or closed  **/
    }

	/* 流的鏈式操作 */
	@Test
	public void test2(){
        List<String> names = source.stream()
			.filter(p -> p.getAge() >= 18)			//年齡大於 18 的人
			.map(p-> p.getFirstName())				//獲取姓氏
			.limit(3)								//截取前 3 個
			.collect(Collectors.toList());			//終端操作：返回集合

	}
}

流的性能問題

我們可以在初始流的時候通過 stream 方法將集合轉換爲順序流，也可以通過 parallelStream 方法將集合直接轉換爲並行流。

當然，我們也可以在操作中使用 parallel 方法將流轉換爲並行流，使用 sequential 方法將流轉換爲順序流。

但是，切記，不要妄想使用者兩個方法隨時切換流的狀態去控制每一箇中間操作，因爲，最後一次調用會影響整個流水線。

關於性能這個問題，我們潛意識會認爲並行流的性能肯定是優於順序流的，因爲是並行執行嘛，但事實上真的如此嗎？

來做一組測試

首先我們定義一個測試求和函數性能的方法：

/* 求前 n 個自然數的和，返回執行時間 */
public long sumPerfTesting(Function<Long, Long> sumFun, long n){
	long fastest = Long.MAX_VALUE;

	/* 
		這裏重複執行 5 次，返回最快的用時。 
		爲什麼呢？因爲編譯器在某些情況下需要預熱，簡單來說，就是因爲某些時候，第一次執行速度會比較偏慢，後面的執行速度纔會正常。
		所以我們一般做性能測試是不取第一次運行結果的，因爲誤差較大，參考價值不大，
		另外，多運行，取平均（有時候也會取最小或最大，根據情況而定）也是做性能測試的一個原則。
	 */
	for(int i = 0; i < 5; i++){
		long start = System.nanoTime();
		long sum = sumFun.apply(n);
		long duration = (System.nanoTime() - start) / 1_000_000;
		System.out.println("Result: " + sum);
		fastest = fastest > duration ? duration : fastest;
	}
	return fastest;
}

接下來，我們在 DoSum 類中定義三種求和的方法：分別是傳統循環、順序流、並行流，來做性能測試。

public class DoSum{
    /* 迭代求和 */
    public static long iteratorSum(long n) {
        long result = 0;
        for (long i = 1L; i <= n; i++) {
            result += i;
        }
        return result;
    }

    /* 順序流求和 */
    public static long sequentSum(long n) {
        return Stream.iterate(1L, i -> i + 1).limit(n).reduce(0L, Long::sum);
    }

    /* 並行流求和 */
    public static long parallelSum(long n) {
        return Stream.iterate(1L, i -> i + 1).limit(n).parallel().reduce(0L, Long::sum);
    }
}

最後，來分別輸出三中求和方法的執行時間

System.out.println("Iterator Sum : " + sumPerfTesting(DoSum::iteratorSum, 10_000_000));
System.out.println("Sequent Sum : " + sumPerfTesting(DoSum::sequentSum, 10_000_000));
System.out.println("Parallel Sum : " + sumPerfTesting(DoSum::parallelSum, 10_000_000));

輸出結果如下：

Iterator Sum : 6
Sequent Sum : 119
Parallel Sum : 500

有沒有覺得很不可思議？

迭代版本速度最快，是可以理解的，因爲迭代是最底層的操作。但是，並行版本的耗時居然是順序版本的近 5 倍，迭代版本的 80 倍之多。

這是爲什麼呢？

原因主要有兩點：

頻繁的拆裝箱耗時過多。

iterator 生成的是裝箱對象，每一個都需要拆箱後纔可以求和，上千萬次拆箱裝箱的動作，耗時非常嚴重
數據的依賴造成並行的優勢無法體現。

類似這種順序累加的求和功能，每次都是使用上一次求和的結果與下一個元素相加並返回，也就是說每次的執行都依賴上一次的結果，所以無法體現出並行的優勢。

綜合所述，這個並行流相當於不僅始終在順序執行，而且還有頻繁的拆裝箱耗時。

分析出以上兩點原因，我們來做兩個大膽的猜測：

猜測一： 使用之前提到的 Stream 原始類型流 LongStream 避免拆裝箱操作，得到的最終耗時應該與順序執行差不多。

猜測二： 使用 LongStream 避免拆裝箱，並且使用 LongStream.rangeClosed 生成範圍數字，並行的優勢就會體現出來，耗時會大大減小。

接下來，驗證一下剛纔的猜測：

驗證一：
修改 parallelSum 方法，使用 LongStream 生成流，返回 long 類型數字，避免拆裝箱

/* 並行流求和 */
public static long parallelSum(long n) {
    return LongStream.iterate(1L, i -> i + 1).limit(n).parallel().reduce(0L, Long::sum);
}

測試結果如下：

Iterator Sum : 6
Sequent Sum : 140
Parallel Sum : 167

可以看到，耗時 167 與順序指定 140 僅相差 27 ms，說明我們的猜測是正確的。

驗證二：
修改 parallelSum 方法，使用 LongStream 的 rangeClosed 方法，生成獨立數字範圍返回，有利於並行執行。

/* 並行流求和 */
public static long parallelSum(long n) {
    return LongStream.rangeClosed(1L, n).reduce(0L, Long::sum);
}

測試結果如下：

Iterator Sum : 6
Sequent Sum : 114
Parallel Sum : 6

終於，並行流的速度遠快於順序流了，甚至有時候還快於迭代速度。

通過上述例子可以看出，並行流的性能，還是很優秀的。但是，一定要注意正確的使用並行流。

並行流一旦使用的場景不對，輕則影響性能，耗時更慢（如上）；重則影響數據，不僅耗時慢，還會導致返回錯誤的數據。

同樣以求和爲例，這次，我們使用一個共享累加器：

class Accumulator{
    public long total = 0;
    public void add(long value) {total += value;}
}

在 DoSum 中創建求和方法

public static sideEffectSum(long n){
    Accumulator accumulator = new Accumulator();
    LongStream.rangeClosed(1, n).forEach(accumulator::add);
    return accumulator.total;
}

調用：

System.out.println("sideEffect Sum : " + sumPerfTesting(DoSum::sideEffectSum, 10_000_000));

結果：

Result: 50000005000000
Result: 50000005000000
Result: 50000005000000
Result: 50000005000000
Result: 50000005000000
sideEffect Sum : 7

貌似沒有問題，來，將其修改爲並行，再測試一次，

public static sideEffectSum(long n){
    Accumulator accumulator = new Accumulator();
    LongStream.rangeClosed(1, n).parallel().forEach(accumulator::add);
    return accumulator.total;
}

調用：

System.out.println("sideEffectSum Sum : " + sumPerfTesting(DoSum::sideEffectSum, 10_000_000));

結果：

Result: 27850481869918
Result: 25948841007030
Result: 7263318432858
Result: 23731343106789
Result: 17107187037081
sideEffect Sum : 2

驚喜嗎？速度是夠快，可卻反回了一堆毫無意義的錯誤數字，這是因爲，多個線程同時訪問累加器
這並不是一個原子操作，所以一定要避免這種情況。

一篇文章學會 Java 8 新特性 —— Stream 流

文章目錄

Java 8 的新特性 —— Stream 流。

何爲聲明性方式？

流的簡介和特點

流的基本操作

流的性能問題

SpringBoot 學習筆記_整合NoSQL —— MongoDB

SpringBoot 學習筆記_整合NoSQL —— Redis

SpringBoot 學習筆記_整合持久層（一）——JdbcTemplate

SpringBoot 學習筆記_整合持久層——Spring Boot JPA

SpringBoot 學習筆記_整合持久層——MyBatis

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結