如何高效的使用并行流

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Java7之前想要并行处理大量数据是很困难的,首先把数据拆分成很多个部分,然后把这这些子部分放入到每个线程中去执行计算逻辑,最后在把每个线程返回的计算结果进行合并操作;在Java7中提供了一个处理大数据的fork/join框架,屏蔽掉了线程之间交互的处理,更加专注于数据的处理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Fork/Join框架","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Fork/Join框架采用的是思想就是分而治之,把大的任务拆分成小的任务,然后放入到独立的线程中去计算,同时为了最大限度的利用多核CPU,采用了一个种","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"工作窃取","attrs":{}}],"attrs":{}},{"type":"text","text":"的算法来运行任务,也就是说当某个线程处理完自己工作队列中的任务后,尝试当其他线程的工作队列中窃取一个任务来执行,直到所有任务处理完毕。所以为了减少线程之间的竞争,通常会使用双端队列,被窃取任务线程永远从双端队列的头部拿任务执行,而窃取任务的线程永远从双端队列的尾部拿任务执行;在百度找了一张图","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6e/6ebd3bdeb6ffeec79d871eaa011048ab.png","alt":"image","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"RecursiveTask","attrs":{}}],"attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用Fork/Join框架首先需要创建自己的任务,需要继承","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"RecursiveTask","attrs":{}}],"attrs":{}},{"type":"text","text":",实现抽象方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"protected abstract V compute();","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"实现类需要在该方法中实现任务的拆分,计算,合并;伪代码可以表示成这样:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"if(任务已经不可拆分){\n return 顺序计算结果;\n} else {\n 1.任务拆分成两个子任务\n 2.递归调用本方法,拆分子任务\n 3.等待子任务执行完成\n 4.合并子任务的结果\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Fork/Join实战","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任务:完成对一亿个自然数求和","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们先使用串行的方式实现,代码如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"long result = LongStream.rangeClosed(1, 100000000)\n .reduce(0, Long::sum);\nSystem.out.println(\"result:\" + result);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用Fork/Join框架实现,代码如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"public class SumRecursiveTask extends RecursiveTask {\n private long[] numbers;\n private int start;\n private int end;\n\n public SumRecursiveTask(long[] numbers) {\n this.numbers = numbers;\n this.start = 0;\n this.end = numbers.length;\n }\n\n public SumRecursiveTask(long[] numbers, int start, int end) {\n this.numbers = numbers;\n this.start = start;\n this.end = end;\n }\n\n @Override\n protected Long compute() {\n int length = end - start;\n if (length < 20000) { //小于20000个就不在进行拆分\n return sum();\n }\n SumRecursiveTask leftTask = new SumRecursiveTask(numbers, start, start + length / 2); //进行任务拆分\n SumRecursiveTask rightTask = new SumRecursiveTask(numbers, start + (length / 2), end); //进行任务拆分\n leftTask.fork(); //把该子任务交友ForkJoinPoll线程池去执行\n rightTask.fork(); //把该子任务交友ForkJoinPoll线程池去执行\n return leftTask.join() + rightTask.join(); //把子任务的结果相加\n }\n\n\n private long sum() {\n int sum = 0;\n for (int i = start; i < end; i++) {\n sum += numbers[i];\n }\n return sum;\n }\n\n\n public static void main(String[] args) {\n long[] numbers = LongStream.rangeClosed(1, 100000000).toArray();\n\n Long result = new ForkJoinPool().invoke(new SumRecursiveTask(numbers));\n System.out.println(\"result:\" +result);\n }\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Fork/Join默认的线程数量就是你的处理器数量,这个值是由","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Runtime.getRuntime().available- Processors()","attrs":{}}],"attrs":{}},{"type":"text","text":"得到的。 但是你可以通过系统属性","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"java.util.concurrent.ForkJoinPool.common. parallelism","attrs":{}}],"attrs":{}},{"type":"text","text":"来改变线程池大小,如下所示: ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"System.setProperty(\"java.util.concurrent.ForkJoinPool.common.parallelism\",\"12\");","attrs":{}}],"attrs":{}},{"type":"text","text":" 这是一个全局设置,因此它将影响代码中所有的并行流。目前还无法专为某个 并行流指定这个值。因为会影响到所有的并行流,所以在任务中经历避免网络/IO操作,否则可能会拖慢其他并行流的运行速度","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"parallelStream","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上我们说到的都是在Java7中使用并行流的操作,Java8并没有止步于此,为我们提供更加便利的方式,那就是","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":";","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"底层还是通过Fork/Join框架来实现的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常见的使用方式","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.串行流转化成并行流","attrs":{}}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"LongStream.rangeClosed(1,1000)\n .parallel()\n .forEach(System.out::println);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.直接生成并行流","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":" List values = new ArrayList<>();\n for (int i = 0; i < 10000; i++) {\n values.add(i);\n }\n values.parallelStream()\n .forEach(System.out::println);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正确的使用parallelStream","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们使用","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"来实现上面的累加例子看看效果,代码如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"public static void main(String[] args) {\n Summer summer = new Summer();\n LongStream.rangeClosed(1, 100000000)\n .parallel()\n .forEach(summer::add);\n System.out.println(\"result:\" + summer.sum);\n\n}\n\nstatic class Summer {\n public long sum = 0;\n\n public void add(long value) {\n sum += value;\n }\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"运行结果如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/67/6756af09b85de9f4aa46ce4bae7f1e6f.png","alt":"result","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"运行之后,我们发现运行的结果不正确,并且每次运行的结果都不一样,这是为什么呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这里其实就是错用","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"常见的情况,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"是非线程安全的,在这个里面中使用多个线程去修改了共享变量sum, 执行了","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sum += value","attrs":{}}],"attrs":{}},{"type":"text","text":"操作,这个操作本身是非原子性的,所以在使用并行流时应该避免去修改共享变量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"修改上面的例子,正确使用","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"来实现,代码如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"long result = LongStream.rangeClosed(1, 100000000)\n .parallel()\n .reduce(0, Long::sum);\nSystem.out.println(\"result:\" + result);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在前面我们已经说过了fork/join的操作流程是:拆子部分,计算,合并结果;因为","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"底层使用的也是fork/join框架,所以这些步骤也是需要做的,但是从上面的代码,我们看到","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Long::sum","attrs":{}}],"attrs":{}},{"type":"text","text":"做了计算,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"reduce","attrs":{}}],"attrs":{}},{"type":"text","text":"做了合并结果,我们并没有去做任务的拆分,所以这个过程肯定是","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"parallelStream","attrs":{}}],"attrs":{}},{"type":"text","text":"已经帮我们实现了,这个时候就必须的说说","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Spliterator","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Spliterator","attrs":{}}],"attrs":{}},{"type":"text","text":"是Java8加入的新接口,是为了并行执行任务而设计的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"public interface Spliterator {\n boolean tryAdvance(Consumer super T> action);\n\n Spliterator trySplit();\n\n long estimateSize();\n\n int characteristics();\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"tryAdvance: 遍历所有的元素,如果还有可以遍历的就返回ture,否则返回false","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"trySplit: 对所有的元素进行拆分成小的子部分,如果已经不能拆分就返回null","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"estimateSize: 当前拆分里面还剩余多少个元素","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"characteristics: 返回当前Spliterator特性集的编码","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"总结","attrs":{}}]},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"要证明并行处理比顺序处理效率高,只能通过测试,不能靠猜测(本文累加的例子在多台电脑上运行了多次,也并不能证明采用并行来处理累加就一定比串行的快多少,所以只能通过多测试,环境不同可能结果就会不同)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"数据量较少,并且计算逻辑简单,通常不建议使用并行流","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"需要考虑流的操作时间消耗","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"在有些情况下需要自己去实现拆分的逻辑,并行流才能高效","attrs":{}}]}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"感谢大家可以耐心地读到这里。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当然,文中或许会存在或多或少的不足、错误之处,有建议或者意见也非常欢迎大家在评论交流。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最后,希望朋友们可以点赞评论关注三连,因为这些就是我分享的全部动力来源🙏","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章