Item 47: Prefer Collection to Stream as a return type(優先選擇 Collection 而不是流作爲返回類型)

Many methods return sequences of elements. Prior to Java 8, the obvious return types for such methods were the collection interfaces Collection, Set, and List; Iterable; and the array types. Usually, it was easy to decide which of these types to return. The norm was a collection interface. If the method existed solely to enable for-each loops or the returned sequence couldn’t be made to implement some Collection method (typically, contains(Object)), the Iterable interface was used. If the returned elements were primitive values or there were stringent performance requirements, arrays were used. In Java 8, streams were added to the platform, substantially complicating the task of choosing the appropriate return type for a sequence-returning method.

許多方法都返回元素序列。在 Java 8 之前,此類方法常見的返回類型是 Collection 集合接口,如 Set 和 List,另外還有 Iterable 以及數組類型。通常,很容易決定使用哪一種類型。標準是一個集合接口。如果方法的存在僅僅是爲了支持 for-each 循環,或者無法使返回的序列實現某個集合方法(通常是 contains(Object)),則使用 Iterable 接口。如果返回的元素是基本數據類型或有嚴格的性能要求,則使用數組。在 Java 8 中,流被添加進來,這大大增加了爲序列返回方法選擇適當返回類型的複雜性。

You may hear it said that streams are now the obvious choice to return a sequence of elements, but as discussed in Item 45, streams do not make iteration obsolete: writing good code requires combining streams and iteration judiciously. If an API returns only a stream and some users want to iterate over the returned sequence with a for-each loop, those users will be justifiably upset. It is especially frustrating because the Stream interface contains the sole abstract method in the Iterable interface, and Stream’s specification for this method is compatible with Iterable’s. The only thing preventing programmers from using a for-each loop to iterate over a stream is Stream’s failure to extend Iterable.

你可能聽說現在流是返回元素序列的明顯選擇,但是正如 Item-45 中所討論的,流不會讓迭代過時:編寫好的代碼需要明智地將流和迭代結合起來。如果一個 API 只返回一個流,而一些用戶希望使用 for-each 循環遍歷返回的序列,那麼這些用戶將會感到不適。這尤其令人沮喪,因爲流接口包含 Iterable 接口中惟一的抽象方法,而且流對該方法的規範與 Iterable 的規範兼容。唯一阻止程序員使用 for-each 循環在流上迭代的是流不能擴展 Iterable。

Sadly, there is no good workaround for this problem. At first glance, it might appear that passing a method reference to Stream’s iterator method would work. The resulting code is perhaps a bit noisy and opaque, but not unreasonable:

遺憾的是,這個問題沒有好的解決辦法。乍一看,似乎將方法引用傳遞給流的 iterator 方法是可行的。生成的代碼可能有點繁瑣,不易理解,但並非不合理:

// Won't compile, due to limitations on Java's type inference
for (ProcessHandle ph : ProcessHandle.allProcesses()::iterator) {
    // Process the process
}

Unfortunately, if you attempt to compile this code, you’ll get an error message:

不幸的是,如果你試圖編譯這段代碼,你會得到一個錯誤消息:

Test.java:6: error: method reference not expected here
for (ProcessHandle ph : ProcessHandle.allProcesses()::iterator) {
^

In order to make the code compile, you have to cast the method reference to an appropriately parameterized Iterable:

爲了編譯代碼,你必須將方法引用轉換爲適當參數化的 Iterable:

// Hideous workaround to iterate over a stream
for (ProcessHandle ph : (Iterable<ProcessHandle>)ProcessHandle.allProcesses()::iterator)

This client code works, but it is too noisy and opaque to use in practice. A better workaround is to use an adapter method. The JDK does not provide such a method, but it’s easy to write one, using the same technique used in-line in the snippets above. Note that no cast is necessary in the adapter method because Java’s type inference works properly in this context:

這個客戶端代碼可以工作,但是它太過繁瑣並不易理解,無法在實踐中使用。更好的解決方案是使用適配器方法。JDK 沒有提供這樣的方法,但是使用上面代碼片段中使用的內聯技術編寫方法很容易。注意,適配器方法中不需要強制轉換,因爲 Java 的類型推斷在此上下文中工作正常:

// Adapter from Stream<E> to Iterable<E>
public static <E> Iterable<E> iterableOf(Stream<E> stream) {
    return stream::iterator;
}

With this adapter, you can iterate over any stream with a for-each statement:

使用此適配器,你可以使用 for-each 語句遍歷任何流:

for (ProcessHandle p : iterableOf(ProcessHandle.allProcesses())) {
    // Process the process
}

Note that the stream versions of the Anagrams program in Item 34 use the Files.lines method to read the dictionary, while the iterative version uses a scanner. The Files.lines method is superior to a scanner, which silently swallows any exceptions encountered while reading the file. Ideally, we would have used Files.lines in the iterative version too. This is the sort of compromise that programmers will make if an API provides only stream access to a sequence and they want to iterate over the sequence with a for-each statement.

注意,Item-34 中 Anagrams 程序的流版本使用 Files.lines 讀取字典,而迭代版本使用掃描器。Files.lines 方法優於掃描器,掃描器在讀取文件時靜默地接收任何異常。理想情況下,我們在 Files.lines 的迭代版本也應該如此。如果一個 API 只提供對一個序列的流訪問,而程序員希望用 for-each 語句遍歷該序列,那麼這是程序員會做出的一種妥協。

Conversely, a programmer who wants to process a sequence using a stream pipeline will be justifiably upset by an API that provides only an Iterable. Again the JDK does not provide an adapter, but it’s easy enough to write one:

相反,如果程序員希望使用流管道來處理序列,那麼只提供可迭代的 API 就會有理由讓他心煩。JDK 同樣沒有提供適配器,但是編寫適配器非常簡單:

// Adapter from Iterable<E> to Stream<E>
public static <E> Stream<E> streamOf(Iterable<E> iterable) {
    return StreamSupport.stream(iterable.spliterator(), false);
}

If you’re writing a method that returns a sequence of objects and you know that it will only be used in a stream pipeline, then of course you should feel free to return a stream. Similarly, a method returning a sequence that will only be used for iteration should return an Iterable. But if you’re writing a public API that returns a sequence, you should provide for users who want to write stream pipelines as well as those who want to write for-each statements, unless you have a good reason to believe that most of your users will want to use the same mechanism.

如果你正在編寫一個返回對象序列的方法,並且你知道它只會在流管道中使用,那麼你當然應該可以隨意返回流。類似地,返回僅用於迭代的序列的方法應該返回一個 Iterable。但是如果你寫一個公共 API,它返回一個序列,你應該兼顧想寫流管道以及想寫 for-each 語句的用戶,除非你有充分的理由相信大多數用戶想要使用相同的機制。

The Collection interface is a subtype of Iterable and has a stream method, so it provides for both iteration and stream access. Therefore, Collection or an appropriate subtype is generally the best return type for a public, sequence-returning method. Arrays also provide for easy iteration and stream access with the Arrays.asList and Stream.of methods. If the sequence you’re returning is small enough to fit easily in memory, you’re probably best off returning one of the standard collection implementations, such as ArrayList or HashSet. But do not store a large sequence in memory just to return it as a collection.

Collection 接口是 Iterable 的一個子類型,它有一個流方法,因此它提供了迭代和流兩種訪問方式。因此,Collection 或其適當的子類通常是公共序列返回方法的最佳返回類型。 數組還提供了使用 Arrays.asListStream.of 方法進行簡單迭代和流訪問。如果返回的序列足夠小,可以輕鬆地裝入內存,那麼最好返回標準集合實現之一,例如 ArrayList 或 HashSet。但是 不要將一個大的序列存儲在內存中,只是爲了將它作爲一個集合返回。

If the sequence you’re returning is large but can be represented concisely, consider implementing a special-purpose collection. For example, suppose you want to return the power set of a given set, which consists of all of its subsets. The power set of {a, b, c} is {{}, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}. If a set has n elements, its power set has 2n. Therefore, you shouldn’t even consider storing the power set in a standard collection implementation. It is, however, easy to implement a custom collection for the job with the help of AbstractList.

如果返回的序列比較大,但是可以有規律地表示,那麼可以考慮實現一個特殊用途的集合。例如,假設你想要返回給定集合的冪集,該集合由它的所有子集組成。{a, b, c} 的排列組合有 {{}, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}。如果一個集合有 n 個元素,它的冪集有 2n。因此,你甚至不應該考慮在標準集合實現中存儲全部排列組合。然而,在 AbstractList 的幫助下,可以很容易實現這個需求的自定義集合。

The trick is to use the index of each element in the power set as a bit vector, where the nth bit in the index indicates the presence or absence of the nth element from the source set. In essence, there is a natural mapping between the binary numbers from 0 to 2n − 1 and the power set of an n-element set. Here’s the code:

訣竅是使用索引冪集的每個元素設置一個位向量,在該指數的 n 位表示第 n 個元素的存在與否從源。在本質上,之間有一個自然的映射二進制數字從 0 到 2n−1 和一組 n 元的冪集。這是代碼:

// Returns the power set of an input set as custom collection
public class PowerSet {
    public static final <E> Collection<Set<E>> of(Set<E> s) {
        List<E> src = new ArrayList<>(s);
        if (src.size() > 30)
            throw new IllegalArgumentException("Set too big " + s);

        return new AbstractList<Set<E>>() {
            @Override
            public int size() {
                return 1 << src.size(); // 2 to the power srcSize
            }

            @Override
            public boolean contains(Object o) {
                return o instanceof Set && src.containsAll((Set)o);
            }

            @Override
            public Set<E> get(int index) {
                Set<E> result = new HashSet<>();
                for (int i = 0; index != 0; i++, index >>= 1)
                    if ((index & 1) == 1)
                        result.add(src.get(i));
                return result;
            }
        };
    }
}

Note that PowerSet.of throws an exception if the input set has more than 30 elements. This highlights a disadvantage of using Collection as a return type rather than Stream or Iterable: Collection has an int-returning size method, which limits the length of the returned sequence to Integer.MAX_VALUE, or 231 − 1. The Collection specification does allow the size method to return 231 − 1 if the collection is larger, even infinite, but this is not a wholly satisfying solution.

注意,如果輸入集包含超過 30 個元素,PowerSet.of 將拋出異常。這突出的缺點使用 Collection 作爲返回類型而不是流或 Iterable:收集 int-returning 大小的方法,這限制了 Integer.MAX_VALUE 返回序列的長度,或 231−1。收集規範允許大小方法返回 231−1 如果集合更大,甚至是無限的,但這不是一個完全令人滿意的解決方案。

In order to write a Collection implementation atop AbstractCollection, you need implement only two methods beyond the one required for Iterable: contains and size. Often it’s easy to write efficient implementations of these methods. If it isn’t feasible, perhaps because the contents of the sequence aren’t predetermined before iteration takes place, return a stream or iterable, whichever feels more natural. If you choose, you can return both using two separate methods.

爲了在 AbstractCollection 之上編寫 Collection 實現,除了 Iterable 所需的方法外,只需要實現兩個方法:contains 和 size。通常很容易編寫這些方法的有效實現。如果它是不可行的,可能是因爲序列的內容在迭代發生之前沒有預先確定,那麼返回一個流或 iterable,以感覺更自然的方式返回。如果你選擇,你可以使用兩個不同的方法返回這兩個值。

There are times when you’ll choose the return type based solely on ease of implementation. For example, suppose you want to write a method that returns all of the (contiguous) sublists of an input list. It takes only three lines of code to generate these sublists and put them in a standard collection, but the memory required to hold this collection is quadratic in the size of the source list. While this is not as bad as the power set, which is exponential, it is clearly unacceptable. Implementing a custom collection, as we did for the power set, would be tedious, more so because the JDK lacks a skeletal Iterator implementation to help us.

有時,你將僅根據實現的易用性來選擇返回類型。例如,假設你想編寫一個返回輸入列表的所有(連續的)子列表的方法。生成這些子列表並將它們放入標準集合中只需要三行代碼,但是保存該集合所需的內存是源列表大小的二次方。雖然這沒有冪集那麼糟糕,冪集是指數的,但顯然是不可接受的。實現自定義集合(就像我們爲 power 集所做的那樣)將會非常繁瑣,因爲 JDK 缺少一個框架迭代器實現來幫助我們。

It is, however, straightforward to implement a stream of all the sublists of an input list, though it does require a minor insight. Let’s call a sublist that contains the first element of a list a prefix of the list. For example, the prefixes of (a, b, c) are (a), (a, b), and (a, b, c). Similarly, let’s call a sublist that contains the last element a suffix, so the suffixes of (a, b, c) are (a, b, c), (b, c), and (c). The insight is that the sublists of a list are simply the suffixes of the prefixes (or identically, the prefixes of the suffixes) and the empty list. This observation leads directly to a clear, reasonably concise implementation:

然而,實現一個輸入列表的所有子列表的流是很簡單的,儘管它確實需要一些深入的瞭解。讓我們將包含列表的第一個元素的子列表稱爲列表的前綴。例如,(a,b,c) 的前綴 (a)(a、b)(a,b,c)。類似地,讓我們調用包含最後一個元素後綴的子列表,因此 (a, b, c) 的後綴是 (a, b, c)(b, c)(c)。我們的理解是,列表的子列表僅僅是前綴的後綴(或後綴的前綴相同)和空列表。這個觀察直接導致了一個清晰、合理、簡潔的實現:

// Returns a stream of all the sublists of its input list
public class SubLists {
    public static <E> Stream<List<E>> of(List<E> list) {
        return Stream.concat(Stream.of(Collections.emptyList()),prefixes(list).flatMap(SubLists::suffixes));
    }

    private static <E> Stream<List<E>> prefixes(List<E> list) {
        return IntStream.rangeClosed(1, list.size()).mapToObj(end -> list.subList(0, end));
    }

    private static <E> Stream<List<E>> suffixes(List<E> list) {
        return IntStream.range(0, list.size()).mapToObj(start -> list.subList(start, list.size()));
    }
}

Note that the Stream.concat method is used to add the empty list into the returned stream. Also note that the flatMap method (Item 45) is used to generate a single stream consisting of all the suffixes of all the prefixes. Finally, note that we generate the prefixes and suffixes by mapping a stream of consecutive int values returned by IntStream.range and IntStream.rangeClosed. This idiom is, roughly speaking, the stream equivalent of the standard for-loop on integer indices. Thus, our sublist implementation is similar in spirit to the obvious nested for-loop:

注意 Stream.concat 方法將空列表添加到返回的流中。還要注意,flatMap 方法(Item-45)用於生成由所有前綴的所有後綴組成的單一流。最後,請注意,我們通過映射由 IntStream.rangeIntStream.rangeClosed 返回的連續 int 值流來生成前綴和後綴。因此,我們的子列表實現在本質上類似於嵌套的 for 循環:

for (int start = 0; start < src.size(); start++)
    for (int end = start + 1; end <= src.size(); end++)
        System.out.println(src.subList(start, end));

It is possible to translate this for-loop directly into a stream. The result is more concise than our previous implementation, but perhaps a bit less readable. It is similar in spirit to the streams code for the Cartesian product in Item 45:

可以將這個 for 循環直接轉換爲流。結果比我們以前的實現更簡潔,但可讀性可能稍差。它在形態上類似於 Item-45 中 Cartesian 的 streams 代碼:

// Returns a stream of all the sublists of its input list
public static <E> Stream<List<E>> of(List<E> list) {
    return IntStream.range(0, list.size())
    .mapToObj(start ->
    IntStream.rangeClosed(start + 1, list.size())
    .mapToObj(end -> list.subList(start, end)))
    .flatMap(x -> x);
}

Like the for-loop that precedes it, this code does not emit the empty list. In order to fix this deficiency, you could either use concat, as we did in the previous version, or replace 1 by (int) Math.signum(start) in the rangeClosed call.

與前面的 for 循環一樣,該代碼不發出空列表。爲了修復這個缺陷,你可以使用 concat,就像我們在上一個版本中所做的那樣,或者在 rangeClosed 調用中將 1 替換爲 (int) Math.signum(start)

Either of these stream implementations of sublists is fine, but both will require some users to employ a Stream-to-Iterable adapter or to use a stream in places where iteration would be more natural. Not only does the Stream-to- Iterable adapter clutter up client code, but it slows down the loop by a factor of 2.3 on my machine. A purpose-built Collection implementation (not shown here) is considerably more verbose but runs about 1.4 times as fast as our stream-based implementation on my machine.

子列表的這兩種流實現都可以,但是都需要一些用戶使用流到迭代的適配器,或者在迭代更自然的地方使用流。流到迭代適配器不僅打亂了客戶機代碼,而且在我的機器上,它還將循環速度降低了 2.3 倍。專門構建的集合實現(這裏沒有顯示)非常冗長,但是運行速度是我的機器上基於流的實現的 1.4 倍。

In summary, when writing a method that returns a sequence of elements, remember that some of your users may want to process them as a stream while others may want to iterate over them. Try to accommodate both groups. If it’s feasible to return a collection, do so. If you already have the elements in a collection or the number of elements in the sequence is small enough to justify creating a new one, return a standard collection such as ArrayList. Otherwise, consider implementing a custom collection as we did for the power set. If it isn’t feasible to return a collection, return a stream or iterable, whichever seems more natural. If, in a future Java release, the Stream interface declaration is modified to extend Iterable, then you should feel free to return streams because they will allow for both stream processing and iteration.

總之,在編寫返回元素序列的方法時,請記住,有些用戶可能希望將它們作爲流處理,而有些用戶可能希望對它們進行迭代。試着適應這兩個羣體。如果可以返回集合,那麼就這樣做。如果你已經在一個集合中擁有了元素,或者序列中的元素數量足夠小,可以創建一個新的元素,那麼返回一個標準集合,例如 ArrayList 。否則,請考慮像對 power 集那樣實現自定義集合。如果返回集合不可行,則返回流或 iterable,以看起來更自然的方式返回。如果在未來的 Java 版本中,流接口聲明被修改爲可迭代的,那麼你應該可以隨意返回流,因爲它們將允許流處理和迭代。


Back to contents of the chapter(返回章節目錄)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章