JVM學習--垃圾回收機制

JVM學習–垃圾回收機制

本章學習jvm中關於垃圾回收機制的相關原理。部分內容參考Oracle官網和咕泡學院教材。分析版本爲jdk1.8。

垃圾的定義

聽到垃圾，就想到了名場面。

斷水流大師兄表示，在座的各位都是垃圾。場景定義如下：

在座：理解成內存區域
各位：內存中的各個對象
都是垃圾：都是沒用的對象
大師兄：上帝視角在看待，我理解成GC Root

最後，大師兄說完後，把其他人都幹掉了，也就是把垃圾回收掉了。

沒用的對象

怎麼理解沒用的對象呢？

引用計數法

引用計數法的理解很簡單。在程序中不存在任何引用指向該對象，那麼這個對象就是垃圾，反之則不是。

思考：如果A引用B，B引用A，那麼這兩個對象永遠不能被回收。

可達性分析

通過GC Root對象開始向下尋找，來確認對象是否可達。

GC Root：可以作爲GC Root的對象爲類加載器、Thread、虛擬機棧的本地變量表、static成員、常量引用、本地方法棧的變量等。

垃圾回收算法

在知道對象是否是垃圾之後，接下來考慮怎麼回收這些垃圾。斷水流大師兄用直拳幹掉了拳擊手、先破壞木刀然後幹掉了劍道部主將等。也就是使用了不同的方式來回收垃圾。

標記清除（Mark-Sweep）

標記清除算法顧名思義就是將垃圾標記出來，然後清除掉這些垃圾對象。如下圖所示：

思考：

標記清除算法，從結果來看確實完成了對垃圾對象的回收，但是會導致內存空間不連續，內存碎片過多。如果下次再次分配的對象較大，可能因爲空間不連續而導致無法分配，從而觸發另一次GC。

複製算法（Copy）

複製算法的原理是將內存區域劃分爲兩塊，當一塊內存空間不足的情況下，將還存活的對象複製到另外一塊空間，然後一次回收。如下圖所示：

思考：

如圖所示，複製算法解決了空間碎片的問題，但是又出現了另外的問題，也就是空間浪費。

標記整理（Mark-Compact）

標記整理算法，標記的過程同標記清除算法一樣。但是後續步驟是將存活對象整理移動到一端，然後清理以外的內存。

分代回收算法策略

思考：

介紹了幾種算法後，那麼在jvm中到底是用什麼算法來做垃圾回收的呢？

Young區：複製算法(對象在被分配之後，可能生命週期比較短，Young區複製效率比較高)
Old區：標記清除或標記整理(Old區對象存活時間比較長，複製來複制去沒必要，不如做個標記再清理)

–參考自咕泡學院教材

垃圾回收器

垃圾回收算法是策略的提供，而垃圾回收器則是策略方法的具體落地。下圖爲java提供的垃圾收集器分類及作用範圍：

–參考自咕泡學院教材

在Java Hotspot VM中提供三種類型的回收器：The serial collector（串行）、The parallel collector（並行）、The mostly concurrent collector（併發）。參考如下官網介紹：

Available Collectors

The Java HotSpot VM includes three different types of collectors, each with different performance characteristics.

The serial collector uses a single thread to perform all garbage collection work, which makes it relatively efficient because there is no communication overhead between threads. It is best-suited to single processor machines, because it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100 MB). The serial collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseSerialGC.

The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware. The parallel collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseParallelGC.

Parallel compaction is a feature that enables the parallel collector to perform major collections in parallel. Without parallel compaction, major collections are performed using a single thread, which can significantly limit scalability. Parallel compaction is enabled by default if the option -XX:+UseParallelGC has been specified. The option to turn it off is -XX:-UseParallelOldGC.

The mostly concurrent collector performs most of its work concurrently (for example, while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium-sized to large-sized data sets in which response time is more important than overall throughput because the techniques used to minimize pauses can reduce application performance. The Java HotSpot VM offers a choice between two mostly concurrent collectors; see The Mostly Concurrent Collectors. Use the option -XX:+UseConcMarkSweepGC to enable the CMS collector or -XX:+UseG1GC to enable the G1 collector.

Serial

Serial是早期jvm提供的對新生代回收的唯一選擇（jdk1.3.1之前）。他是一種串行的單線程回收器，僅僅只會使用單個CPU、單個線程去完成垃圾回收工作。

優點：簡單高效，擁有很高的單線程收集效率
缺點：收集過程需要暫停所有線程
算法：複製算法
適用範圍：新生代
應用：Client模式下的默認新生代收集器

ParNew

Serial多線程版本，理解成多核CPU發展後對Serial的升級。

優點：在多CPU時，比Serial效率高。
缺點：收集過程暫停所有應用程序線程，單CPU時比Serial效率差。
算法：複製算法
適用範圍：新生代
應用：運行在Server模式下的虛擬機中首選的新生代收集器

Parallel Scavenge

Parallel Scavenge同ParNew類似，也是新生代的垃圾回收器。但是Parallel Scavenge更加關注系統吞吐量。

吞吐量=運行用戶代碼的時間/(運行用戶代碼的時間+垃圾收集時間)
比如虛擬機總共運行了100分鐘，垃圾收集時間用了1分鐘，吞吐量=(100-1)/100=99%。
若吞吐量越大，意味着垃圾收集的時間越短，則用戶代碼可以充分利用CPU資源，儘快完成程序
的運算任務。

Serial Old

Serial Old可以理解爲Serial的老年代版本，區別在於Serial Old使用的是標記整理算法。

Parallel Old

Parallel Old可以理解爲Parallel Scavenge的老年代版本，區別在於Parallel Old使用的是標記整理算法。

Concurrent Mark Sweep(CMS)

CMS是爲了更短的垃圾回收時間並且能夠在垃圾回收階段應用程序與垃圾回收器共享CPU資源而設計的。參考如下官網介紹：

The Concurrent Mark Sweep (CMS) collector is designed for applications that prefer shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications that have a relatively large set of long-lived data (a large tenured generation) and run on machines with two or more processors tend to benefit from the use of this collector. However, this collector should be considered for any application with a low pause time requirement. The CMS collector is enabled with the command-line option -XX:+UseConcMarkSweepGC.

直譯：Concurrent Mark Sweep (CMS)收集器是爲那些喜歡更短的垃圾收集暫停時間並且能夠在應用程序運行時與垃圾收集器共享處理器資源的應用程序設計的。通常，具有相對較大的長期數據集(較大的長期生成)並在具有兩個或更多處理器的機器上運行的應用程序會受益於此收集器的使用。但是，對於任何暫停時間要求較低的應用程序，都應該考慮使用此收集器。CMS收集器通過命令行選項-XX:+UseConcMarkSweepGC啓用。

回收階段執行示意圖：

(1)初始標記 CMS initial mark 標記GC Roots能關聯到的對象 Stop The World–
->速度很快
(2)併發標記 CMS concurrent mark 進行GC Roots Tracing
(3)重新標記 CMS remark 修改併發標記因用戶程序變動的內容 Stop The
World
(4)併發清除 CMS concurrent sweep

由於整個過程中，併發標記和併發清除，收集器線程可以與用戶線程一起工作，所以總體上來
說，CMS收集器的內存回收過程是與用戶線程一起併發地執行的。

優點：併發收集、低停頓
缺點：產生大量空間碎片、併發階段會降低吞吐量

–參考自咕泡學院教材

更多詳細內容可以參考官網：Concurrent Mark Sweep (CMS) Collector

Garbage-First（G1）

G1是針對現代環境下多處理器和大內存設備設計的一款支持儘可能滿足用戶指定的垃圾回收暫時時間目標，同時實現高吞吐量、併發執行、整堆操作。

The Garbage-First (G1) garbage collector is a server-style garbage collector, targeted for multiprocessor machines with large memories. It attempts to meet garbage collection (GC) pause time goals with high probability while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads. This prevents interruptions proportional to heap or live-data size.

直譯：Garbage-First(G1)垃圾收集器是一種Server風格的垃圾收集器，針對具有大內存的多處理器機器。它嘗試以高概率滿足垃圾收集(GC)暫停時間目標，同時實現高吞吐量。整堆操作(例如全局標記)與應用程序線程併發執行。這可以防止中斷與堆或實時數據大小成比例。

並行與併發
分代收集（仍然保留了分代的概念）
空間整合（整體上屬於“標記-整理”算法，不會導致空間碎片）
可預測的停頓（比CMS更先進的地方在於能讓使用者明確指定一個長度爲M毫秒的時間片段內，消耗在垃圾收集
上的時間不得超過N毫秒）

使用G1收集器時，Java堆的內存佈局與就與其他收集器有很大差別，它將整個Java堆劃分爲多個
大小相等的獨立區域（Region），雖然還保留有新生代和老年代的概念，但新生代和老年代不再
是物理隔離的了，它們都是一部分Region（不需要連續）的集合。

回收步驟如下：

初始標記（Initial Marking）標記一下GC Roots能夠關聯的對象，並且修改TAMS的值，需要暫
停用戶線程
併發標記（Concurrent Marking）從GC Roots進行可達性分析，找出存活的對象，與用戶線程併發
執行
最終標記（Final Marking）修正在併發標記階段因爲用戶程序的併發執行導致變動的數據，需
暫停用戶線程
篩選回收（Live Data Counting and Evacuation）對各個Region的回收價值和成本進行排序，根據
用戶所期望的GC停頓時間制定回收計劃

–參考自咕泡學院教材

回收階段示意圖：

更多詳細內容可以參考官網：Garbage-First Garbage Collector

理解吞吐量和暫停時間

停頓時間->垃圾收集器進行垃圾回收終端應用執行響應的時間

吞吐量->運行用戶代碼時間/(運行用戶代碼時間+垃圾收集時間)

停頓時間越短就越適合需要和用戶交互的程序，良好的響應速度能提升用戶體驗；
高吞吐量則可以高效地利用CPU時間，儘快完成程序的運算任務，主要適合在後臺運算而不需要太多交互的任
務。

這兩個指標也是評價垃圾回收器好處的標準，其實調優也就是在觀察者兩個變量。

–參考自咕泡學院教材

選擇垃圾回收器

參考官網介紹：Selecting a Collector

Unless your application has rather strict pause time requirements, first run your application and allow the VM to select a collector. If necessary, adjust the heap size to improve performance. If the performance still does not meet your goals, then use the following guidelines as a starting point for selecting a collector.

If the application has a small data set (up to approximately 100 MB), then

select the serial collector with the option -XX:+UseSerialGC.

If the application will be run on a single processor and there are no pause time requirements, then let the VM select the collector, or select the serial collector with the option -XX:+UseSerialGC.

If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC.

If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with -XX:+UseConcMarkSweepGC or -XX:+UseG1GC.

These guidelines provide only a starting point for selecting a collector because performance is dependent on the size of the heap, the amount of live data maintained by the application, and the number and speed of available processors. Pause times are particularly sensitive to these factors, so the threshold of 1 second mentioned previously is only approximate: the parallel collector will experience pause times longer than 1 second on many data size and hardware combinations; conversely, the concurrent collector may not be able to keep pauses shorter than 1 second on some combinations.

If the recommended collector does not achieve the desired performance, first attempt to adjust the heap and generation sizes to meet the desired goals. If performance is still inadequate, then try a different collector: use the concurrent collector to reduce pause times and use the parallel collector to increase overall throughput on multiprocessor hardware.

直譯：除非應用程序有相當嚴格的暫停時間要求，否則首先運行應用程序並允許VM選擇收集器。如果需要，調整堆大小以提高性能。如果性能仍然不能滿足您的目標，那麼使用以下指導原則作爲選擇收集器的起點。

如果應用程序的數據集很小(最多大約100 MB)，那麼

使用選項-XX:+UseSerialGC選擇串行收集器。

如果應用程序運行在單個處理器上，並且沒有暫停時間要求，那麼讓VM選擇收集器，或者選擇串行收集器，選項-XX:+UseSerialGC。

如果(a)應用程序性能峯值是優先級，並且(b)沒有暫停時間要求，或者可以接受1秒或更長時間的暫停，那麼讓VM選擇收集器，或者選擇並行收集器，使用-XX:+UseParallelGC。

如果響應間比總體吞吐量更重要，並且垃圾收集暫停時間必須保持在大約1秒以下，那麼選擇具有-XX:+UseConcMarkSweepGC或-XX:+UseG1GC的併發收集器。

這些指導原則僅爲選擇收集器提供了一個起點，因爲性能取決於堆的大小、應用程序維護的活動數據量以及可用處理器的數量和速度。暫停時間對這些因素特別敏感，因此前面提到的1秒閾值只是近似值:並行收集器在許多數據大小和硬件組合上的暫停時間將超過1秒;相反，在某些組合中，併發收集器可能無法將暫停時間保持在1秒以下。

如果推薦的收集器沒有達到預期的性能，首先嚐試調整堆和生成大小以滿足預期的目標。如果性能仍然不足夠，那麼嘗試使用不同的收集器:使用併發收集器來減少暫停時間，使用並行收集器來增加多處理器硬件上的總體吞吐量。

總的來說，對於垃圾回收器的選擇要取決於使用場景、服務器配置以及應用程序規模。需要經過不斷嘗試調整才能得出最適合自身的方式。

JVM學習--垃圾回收機制