Item 45: Use streams judiciously(明智地使用流)

The streams API was added in Java 8 to ease the task of performing bulk operations, sequentially or in parallel. This API provides two key abstractions: the stream, which represents a finite or infinite sequence of data elements, and the stream pipeline, which represents a multistage computation on these elements. The elements in a stream can come from anywhere. Common sources include collections, arrays, files, regular expression pattern matchers, pseudorandom number generators, and other streams. The data elements in a stream can be object references or primitive values. Three primitive types are supported: int, long, and double.

在 Java 8 中添加了流 API,以簡化序列或並行執行批量操作的任務。這個 API 提供了兩個關鍵的抽象:流(表示有限或無限的數據元素序列)和流管道(表示對這些元素的多階段計算)。流中的元素可以來自任何地方。常見的源包括集合、數組、文件、正則表達式的 Pattern 匹配器、僞隨機數生成器和其他流。流中的數據元素可以是對象的引用或基本數據類型。支持三種基本數據類型:int、long 和 double。

A stream pipeline consists of a source stream followed by zero or more intermediate operations and one terminal operation. Each intermediate operation transforms the stream in some way, such as mapping each element to a function of that element or filtering out all elements that do not satisfy some condition. Intermediate operations all transform one stream into another, whose element type may be the same as the input stream or different from it. The terminal operation performs a final computation on the stream resulting from the last intermediate operation, such as storing its elements into a collection, returning a certain element, or printing all of its elements.

流管道由源流、零個或多個 Intermediate 操作和一個 Terminal 操作組成。每個 Intermediate 操作以某種方式轉換流,例如將每個元素映射到該元素的一個函數,或者過濾掉不滿足某些條件的所有元素。中間操作都將一個流轉換爲另一個流,其元素類型可能與輸入流相同,也可能與輸入流不同。Terminal 操作對最後一次 Intermediate 操作所產生的流進行最終計算,例如將其元素存儲到集合中、返回特定元素、或打印其所有元素。

Stream pipelines are evaluated lazily: evaluation doesn’t start until the terminal operation is invoked, and data elements that aren’t required in order to complete the terminal operation are never computed. This lazy evaluation is what makes it possible to work with infinite streams. Note that a stream pipeline without a terminal operation is a silent no-op, so don’t forget to include one.

流管道的計算是惰性的:直到調用 Terminal 操作時纔開始計算,並且對完成 Terminal 操作不需要的數據元素永遠不會計算。這種惰性的求值機制使得處理無限流成爲可能。請注意,沒有 Terminal 操作的流管道是無動作的,因此不要忘記包含一個 Terminal 操作。

The streams API is fluent: it is designed to allow all of the calls that comprise a pipeline to be chained into a single expression. In fact, multiple pipelines can be chained together into a single expression.

流 API 是流暢的:它被設計成允許使用鏈式調用將組成管道的所有調用寫到單個表達式中。實際上,可以將多個管道鏈接到一個表達式中。

By default, stream pipelines run sequentially. Making a pipeline execute in parallel is as simple as invoking the parallel method on any stream in the pipeline, but it is seldom appropriate to do so (Item 48).

默認情況下,流管道按順序運行。讓管道並行執行與在管道中的任何流上調用並行方法一樣簡單,但是這樣做不一定合適(Item-48)。

The streams API is sufficiently versatile that practically any computation can be performed using streams, but just because you can doesn’t mean you should. When used appropriately, streams can make programs shorter and clearer; when used inappropriately, they can make programs difficult to read and maintain. There are no hard and fast rules for when to use streams, but there are heuristics.

流 API 非常通用,實際上任何計算都可以使用流來執行,但這並不意味着你就應該這樣做。如果使用得當,流可以使程序更短、更清晰;如果使用不當,它們會使程序難以讀取和維護。對於何時使用流沒有硬性的規則,但是有一些啓發式的規則。

Consider the following program, which reads the words from a dictionary file and prints all the anagram groups whose size meets a user-specified minimum. Recall that two words are anagrams if they consist of the same letters in a different order. The program reads each word from a user-specified dictionary file and places the words into a map. The map key is the word with its letters alphabetized, so the key for "staple" is "aelpst", and the key for "petals" is also "aelpst": the two words are anagrams, and all anagrams share the same alphabetized form (or alphagram, as it is sometimes known). The map value is a list containing all of the words that share an alphabetized form. After the dictionary has been processed, each list is a complete anagram group. The program then iterates through the map’s values() view and prints each list whose size meets the threshold:

考慮下面的程序,它從字典文件中讀取單詞並打印所有大小滿足用戶指定最小值的變位組。回想一下,如果兩個單詞以不同的順序由相同的字母組成,那麼它們就是字謎。該程序從用戶指定的字典文件中讀取每個單詞,並將這些單詞放入一個 Map 中。Map 的鍵是按字母順序排列的單詞,因此「staple」的鍵是「aelpst」,而「petals」的鍵也是「aelpst」:這兩個單詞是字謎,所有的字謎都有相同的字母排列形式(有時稱爲字母圖)。Map 的值是一個列表,其中包含共享按字母順序排列的表單的所有單詞。在字典被處理之後,每個列表都是一個完整的字謎組。然後,該程序遍歷 Map 的 values() 視圖,並打印大小滿足閾值的每個列表:

// Prints all large anagram groups in a dictionary iteratively
public class Anagrams {
    public static void main(String[] args) throws IOException {
        File dictionary = new File(args[0]);
        int minGroupSize = Integer.parseInt(args[1]);
        Map<String, Set<String>> groups = new HashMap<>();
        try (Scanner s = new Scanner(dictionary)) {
            while (s.hasNext()) {
                String word = s.next();
                groups.computeIfAbsent(alphabetize(word),(unused) -> new TreeSet<>()).add(word);
            }
        }
        for (Set<String> group : groups.values())
        if (group.size() >= minGroupSize)
            System.out.println(group.size() + ": " + group);
    }

    private static String alphabetize(String s) {
        char[] a = s.toCharArray();
        Arrays.sort(a);
        return new String(a);
    }
}

One step in this program is worthy of note. The insertion of each word into the map, which is shown in bold, uses the computeIfAbsent method, which was added in Java 8. This method looks up a key in the map: If the key is present, the method simply returns the value associated with it. If not, the method computes a value by applying the given function object to the key, associates this value with the key, and returns the computed value. The computeIfAbsent method simplifies the implementation of maps that associate multiple values with each key.

這個程序中的一個步驟值得注意。將每個單詞插入到 Map 中(以粗體顯示)使用 computeIfAbsent 方法,該方法是在 Java 8 中添加的。此方法在 Map 中查找鍵:如果鍵存在,則該方法僅返回與其關聯的值。若不存在,則該方法通過將給定的函數對象應用於鍵來計算一個值,將該值與鍵關聯,並返回計算的值。computeIfAbsent 方法簡化了將多個值與每個鍵關聯的 Map 的實現。

Now consider the following program, which solves the same problem, but makes heavy use of streams. Note that the entire program, with the exception of the code that opens the dictionary file, is contained in a single expression. The only reason the dictionary is opened in a separate expression is to allow the use of the try-with-resources statement, which ensures that the dictionary file is closed:

現在考慮下面的程序,它解決了相同的問題,但是大量使用了流。注意,除了打開字典文件的代碼之外,整個程序都包含在一個表達式中。在單獨的表達式中打開字典的唯一原因是允許使用 try with-resources 語句,該語句確保字典文件是關閉的:

// Overuse of streams - don't do this!
public class Anagrams {
    public static void main(String[] args) throws IOException {
        Path dictionary = Paths.get(args[0]);
        int minGroupSize = Integer.parseInt(args[1]);
        try (Stream<String> words = Files.lines(dictionary)) {
            words.collect(
            groupingBy(word -> word.chars().sorted()
            .collect(StringBuilder::new,(sb, c) -> sb.append((char) c),
            StringBuilder::append).toString()))
            .values().stream()
            .filter(group -> group.size() >= minGroupSize)
            .map(group -> group.size() + ": " + group)
            .forEach(System.out::println);
        }
    }
}

If you find this code hard to read, don’t worry; you’re not alone. It is shorter, but it is also less readable, especially to programmers who are not experts in the use of streams. Overusing streams makes programs hard to read and maintain. Luckily, there is a happy medium. The following program solves the same problem, using streams without overusing them. The result is a program that’s both shorter and clearer than the original:

如果你發現這段代碼難以閱讀,不要擔心;不單是你有這樣的感覺。它雖然更短,但可讀性也更差,特別是對於不擅長流使用的程序員來說。過度使用流會使得程序難以讀取和維護。幸運的是,有一個折衷的辦法。下面的程序解決了相同的問題,在不過度使用流的情況下使用流。結果,這個程序比原來的程序更短,也更清晰:

// Tasteful use of streams enhances clarity and conciseness
public class Anagrams {
    public static void main(String[] args) throws IOException {
        Path dictionary = Paths.get(args[0]);
        int minGroupSize = Integer.parseInt(args[1]);
        try (Stream<String> words = Files.lines(dictionary)) {
            words.collect(groupingBy(word -> alphabetize(word)))
            .values().stream()
            .filter(group -> group.size() >= minGroupSize)
            .forEach(g -> System.out.println(g.size() + ": " + g));
        }
    }
    // alphabetize method is the same as in original version
}

Even if you have little previous exposure to streams, this program is not hard to understand. It opens the dictionary file in a try-with-resources block, obtaining a stream consisting of all the lines in the file. The stream variable is named words to suggest that each element in the stream is a word. The pipeline on this stream has no intermediate operations; its terminal operation collects all the words into a map that groups the words by their alphabetized form (Item 46). This is exactly the same map that was constructed in both previous versions of the program. Then a new Stream<List<String>> is opened on the values() view of the map. The elements in this stream are, of course, the anagram groups. The stream is filtered so that all of the groups whose size is less than minGroupSize are ignored, and finally, the remaining groups are printed by the terminal operation forEach.

即使你以前很少接觸流,這個程序也不難理解。它在帶有資源的 try 塊中打開字典文件,獲得由文件中所有行組成的流。流變量名爲 words,表示流中的每個元素都是一個單詞。此流上的管道沒有 Intermediate 操作;它的 Terminal 操作將所有單詞收集到一個 Map 中,該 Map 按字母順序將單詞分組(Item-46)。這與在程序的前兩個版本中構造的 Map 完全相同。然後在 Map 的 values() 視圖上打開一個新的 Stream<List<String>>。這個流中的元素當然是字謎組。對流進行過濾,以便忽略所有大小小於 minGroupSize 的組,最後,Terminal 操作 forEach 打印其餘組。

Note that the lambda parameter names were chosen carefully. The parameter g should really be named group, but the resulting line of code would be too wide for the book. In the absence of explicit types, careful naming of lambda parameters is essential to the readability of stream pipelines.

注意,lambda 表達式參數名稱是經過仔細選擇的。參數 g 實際上應該命名爲 group,但是生成的代碼行對於本書排版來說太寬了。在沒有顯式類型的情況下,lambda 表達式參數的謹慎命名對於流管道的可讀性至關重要。

Note also that word alphabetization is done in a separate alphabetize method. This enhances readability by providing a name for the operation and keeping implementation details out of the main program. Using helper methods is even more important for readability in stream pipelines than in iterative code because pipelines lack explicit type information and named temporary variables.

還要注意,單詞的字母化是在一個單獨的字母化方法中完成的。這通過爲操作提供一個名稱並將實現細節排除在主程序之外,從而增強了可讀性。在流管道中使用 helper 方法比在迭代代碼中更重要,因爲管道缺少顯式類型信息和命名的臨時變量。

The alphabetize method could have been reimplemented to use streams, but a stream-based alphabetize method would have been less clear, more difficult to write correctly, and probably slower. These deficiencies result from Java’s lack of support for primitive char streams (which is not to imply that Java should have supported char streams; it would have been infeasible to do so). To demonstrate the hazards of processing char values with streams, consider the following code:

本來可以重新實現字母順序方法來使用流,但是基於流的字母順序方法就不那麼清晰了,更難於正確地編寫,而且可能會更慢。這些缺陷是由於 Java 不支持基本字符流(這並不意味着 Java 應該支持字符流;這樣做是不可行的)。要演示使用流處理 char 值的危害,請考慮以下代碼:

"Hello world!".chars().forEach(System.out::print);

You might expect it to print Hello world!, but if you run it, you’ll find that it prints 721011081081113211911111410810033. This happens because the elements of the stream returned by "Hello world!".chars() are not char values but int values, so the int overloading of print is invoked. It is admittedly confusing that a method named chars returns a stream of int values. You could fix the program by using a cast to force the invocation of the correct overloading:

你可能希望它打印 Hello world!,但如果運行它,你會發現它打印 721011081081113211911111410810033。這是因爲 "Hello world!".chars() 返回的流元素不是 char 值,而是 int 值,因此調用了 print 的 int 重載。無可否認,一個名爲 chars 的方法返回一個 int 值流是令人困惑的。你可以通過強制調用正確的重載來修復程序:

"Hello world!".chars().forEach(x -> System.out.print((char) x));

but ideally you should refrain from using streams to process char values. When you start using streams, you may feel the urge to convert all your loops into streams, but resist the urge. While it may be possible, it will likely harm the readability and maintainability of your code base. As a rule, even moderately complex tasks are best accomplished using some combination of streams and iteration, as illustrated by the Anagrams programs above. So refactor existing code to use streams and use them in new code only where it makes sense to do so.

但理想情況下,你應該避免使用流來處理 char 值。當你開始使用流時,你可能會有將所有循環轉換爲流的衝動,但是要抵制這種衝動。雖然這是可實施的,但它可能會損害代碼庫的可讀性和可維護性。通常,即使是中等複雜的任務,也最好使用流和迭代的某種組合來完成,如上面的 Anagrams 程序所示。因此,重構現有代碼或是在新代碼中,都應該在合適的場景使用流。

As shown in the programs in this item, stream pipelines express repeated computation using function objects (typically lambdas or method references), while iterative code expresses repeated computation using code blocks. There are some things you can do from code blocks that you can’t do from function objects:

如本項中的程序所示,流管道使用函數對象(通常是 lambda 表達式或方法引用)表示重複計算,而迭代代碼使用代碼塊表示重複計算。有些事情你可以對代碼塊做,而你不能對函數對象做:

  • From a code block, you can read or modify any local variable in scope; from a lambda, you can only read final or effectively final variables [JLS 4.12.4], and you can’t modify any local variables.

從代碼塊中,可以讀取或修改作用域中的任何局部變量;在 lambda 表達式中,只能讀取 final 或有效的 final 變量 [JLS 4.12.4],不能修改任何局部變量。

  • From a code block, you can return from the enclosing method, break or continue an enclosing loop, or throw any checked exception that this method is declared to throw; from a lambda you can do none of these things.

從代碼塊中,可以從封閉方法返回、中斷或繼續封閉循環,或拋出聲明要拋出的任何已檢查異常;在 lambda 表達式中,你不能做這些事情。

If a computation is best expressed using these techniques, then it’s probably not a good match for streams. Conversely, streams make it very easy to do some things:

如果使用這些技術最好地表達計算,那麼它可能不適合流。相反,流使做一些事情變得非常容易:

  • Uniformly transform sequences of elements

元素序列的一致變換

  • Filter sequences of elements

過濾元素序列

  • Combine sequences of elements using a single operation (for example to add them, concatenate them, or compute their minimum)

使用單個操作組合元素序列(例如添加它們、連接它們或計算它們的最小值)

  • Accumulate sequences of elements into a collection, perhaps grouping them by some common attribute

將元素序列累積到一個集合中,可能是按某個公共屬性對它們進行分組

  • Search a sequence of elements for an element satisfying some criterion

在元素序列中搜索滿足某些條件的元素

If a computation is best expressed using these techniques, then it is a good candidate for streams.

如果使用這些技術能夠最好地表達計算,那麼它就是流的一個很好的使用場景。

One thing that is hard to do with streams is to access corresponding elements from multiple stages of a pipeline simultaneously: once you map a value to some other value, the original value is lost. One workaround is to map each value to a pair object containing the original value and the new value, but this is not a satisfying solution, especially if the pair objects are required for multiple stages of a pipeline. The resulting code is messy and verbose, which defeats a primary purpose of streams. When it is applicable, a better workaround is to invert the mapping when you need access to the earlier-stage value.

使用流很難做到的一件事是從管道的多個階段同時訪問相應的元素:一旦你將一個值映射到另一個值,原始值就會丟失。一個解決方案是將每個值映射到包含原始值和新值的 pair 對象,但這不是一個令人滿意的解決方案,特別是如果管道的多個階段都需要 pair 對象的話。生成的代碼混亂而冗長,這違背了流的主要目的。當它適用時,更好的解決方案是在需要訪問早期階段值時反轉映射。

For example, let’s write a program to print the first twenty Mersenne primes. To refresh your memory, a Mersenne number is a number of the form 2p − 1. If p is prime, the corresponding Mersenne number may be prime; if so, it’s a Mersenne prime. As the initial stream in our pipeline, we want all the prime numbers. Here’s a method to return that (infinite) stream. We assume a static import has been used for easy access to the static members of BigInteger:

例如,讓我們編寫一個程序來打印前 20 個 Mersenne 素數。刷新你的記憶,一個 Mersenne 素數的數量是一個數字形式 2p − 1。如果 p 是素數,對應的 Mersenne 數可以是素數;如果是的話,這就是 Mersenne 素數。作爲管道中的初始流,我們需要所有質數。這裏有一個返回(無限)流的方法。我們假設已經使用靜態導入來方便地訪問 BigInteger 的靜態成員:

static Stream<BigInteger> primes() {
    return Stream.iterate(TWO, BigInteger::nextProbablePrime);
}

The name of the method (primes) is a plural noun describing the elements of the stream. This naming convention is highly recommended for all methods that return streams because it enhances the readability of stream pipelines. The method uses the static factory Stream.iterate, which takes two parameters: the first element in the stream, and a function to generate the next element in the stream from the previous one. Here is the program to print the first twenty Mersenne primes:

方法的名稱(素數)是描述流元素的複數名詞。強烈推薦所有返回流的方法使用這種命名約定,因爲它增強了流管道的可讀性。該方法使用靜態工廠 Stream.iterate,它接受兩個參數:流中的第一個元素和一個函數,用於從前一個元素生成流中的下一個元素。下面是打印前 20 個 Mersenne 素數的程序:

public static void main(String[] args) {
    primes().map(p -> TWO.pow(p.intValueExact()).subtract(ONE))
    .filter(mersenne -> mersenne.isProbablePrime(50))
    .limit(20)
    .forEach(System.out::println);
}

This program is a straightforward encoding of the prose description above: it starts with the primes, computes the corresponding Mersenne numbers, filters out all but the primes (the magic number 50 controls the probabilistic primality test), limits the resulting stream to twenty elements, and prints them out.

這個程序是上述散文描述的一個簡單編碼:它從素數開始,計算相應的 Mersenne 數,過濾掉除素數以外的所有素數(魔法數字 50 控制概率素數測試),將結果流限制爲 20 個元素,並打印出來。

Now suppose that we want to precede each Mersenne prime with its exponent (p). This value is present only in the initial stream, so it is inaccessible in the terminal operation, which prints the results. Luckily, it’s easy to compute the exponent of a Mersenne number by inverting the mapping that took place in the first intermediate operation. The exponent is simply the number of bits in the binary representation, so this terminal operation generates the desired result:

現在假設我們想要在每個 Mersenne 素數之前加上它的指數 (p),這個值只在初始流中存在,因此在輸出結果的終端操作中是不可訪問的。幸運的是,通過對第一個中間操作中發生的映射求逆,可以很容易地計算出 Mersenne 數的指數。指數僅僅是二進制表示的比特數,因此這個終端操作產生了想要的結果:

.forEach(mp -> System.out.println(mp.bitLength() + ": " + mp));

There are plenty of tasks where it is not obvious whether to use streams or iteration. For example, consider the task of initializing a new deck of cards. Assume that Card is an immutable value class that encapsulates a Rank and a Suit, both of which are enum types. This task is representative of any task that requires computing all the pairs of elements that can be chosen from two sets. Mathematicians call this the Cartesian product of the two sets. Here’s an iterative implementation with a nested for-each loop that should look very familiar to you:

在許多任務中,使用流還是迭代並不明顯。例如,考慮初始化一副新紙牌的任務。假設 Card 是一個不可變的值類,它封裝了 Rank 和 Suit,它們都是 enum 類型。此任務代表需要計算可從兩個集合中選擇的所有元素對的任何任務。數學家稱之爲這兩個集合的笛卡爾積。下面是一個嵌套 for-each 循環的迭代實現,你應該非常熟悉它:

// Iterative Cartesian product computation
private static List<Card> newDeck() {
    List<Card> result = new ArrayList<>();
    for (Suit suit : Suit.values())
    for (Rank rank : Rank.values())
    result.add(new Card(suit, rank));
    return result;
}

And here is a stream-based implementation that makes use of the intermediate operation flatMap. This operation maps each element in a stream to a stream and then concatenates all of these new streams into a single stream (or flattens them). Note that this implementation contains a nested lambda, shown in boldface:

這是一個基於流的實現,它使用了中間操作 flatMap。此操作將流中的每個元素映射到流,然後將所有這些新流連接到單個流中(或將其扁平化)。注意,這個實現包含一個嵌套 lambda 表達式,用粗體顯示:

// Stream-based Cartesian product computation
private static List<Card> newDeck() {
    return Stream.of(Suit.values())
    .flatMap(suit ->Stream.of(Rank.values())
    .map(rank -> new Card(suit, rank)))
    .collect(toList());
}

Which of the two versions of newDeck is better? It boils down to personal preference and the environment in which you’re programming. The first version is simpler and perhaps feels more natural. A larger fraction of Java programmers will be able to understand and maintain it, but some programmers will feel more comfortable with the second (stream-based) version. It’s a bit more concise and not too difficult to understand if you’re reasonably well-versed in streams and functional programming. If you’re not sure which version you prefer, the iterative version is probably the safer choice. If you prefer the stream version and you believe that other programmers who will work with the code will share your preference, then you should use it.

兩個版本的 newDeck 哪個更好?這可以歸結爲個人偏好和編程環境。第一個版本更簡單,可能感覺更自然。大部分 Java 程序員將能夠理解和維護它,但是一些程序員將對第二個(基於流的)版本感到更舒服。如果你相當精通流和函數式編程,那麼它會更簡潔,也不會太難理解。如果你不確定你更喜歡哪個版本,迭代版本可能是更安全的選擇。如果你更喜歡流版本,並且相信與代碼一起工作的其他程序員也會分享你的偏好,那麼你應該使用它。

In summary, some tasks are best accomplished with streams, and others with iteration. Many tasks are best accomplished by combining the two approaches. There are no hard and fast rules for choosing which approach to use for a task, but there are some useful heuristics. In many cases, it will be clear which approach to use; in some cases, it won’t. If you’re not sure whether a task is better served by streams or iteration, try both and see which works better.

總之,有些任務最好使用流來完成,有些任務最好使用迭代來完成。許多任務最好通過結合這兩種方法來完成。對於選擇任務使用哪種方法,沒有硬性的規則,但是有一些有用的啓發。在許多情況下,使用哪種方法是清楚的;在某些情況下很難決定。如果你不確定一個任務是通過流還是通過迭代更好地完成,那麼同時嘗試這兩種方法,看看哪一種更有效。


Back to contents of the chapter(返回章節目錄)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章