GC - The Garbage Collector

 The Garbage Collector


Note that the discussions of garbage collectors, the various GC algorithms, and modeling in the following sections focus on Sun's implementations of the Java Virtual Machine.


The Java runtime system's garbage collector runs as a separate thread. As the application allocates more and more objects, it depletes the amount of heap memory available. When this drops below a threshold percentage, the garbage collection process begins. The garbage collector stops all other runtime threads. It marks each object as live or dead, and reclaims the space taken up by the dead objects.


There are many different types of garbage collectors. Each is based on a different algorithm and exhibits different behavior. They range from simple reference-counting collectors to very advanced generational collectors. Both the algorithm chosen and the implementation can affect the behavior of the garbage collector. Here are some terms that refer to the implementation of a particular collector – note these are not mutually exclusive:


Stop-the-world – Stops all application threads while it works.
Concurrent – Allows application threads to run while it works.
Parallel – Multiple threads work on collecting at the same time.
A few of the traditional collectors are discussed below.


2.1 Copying Collector


A copying collector employs two or more storage areas to create and destroy objects. If it uses only two storage areas, they are called "semi-spaces." One semi-space (the "from-space") is used to create objects, and once it is full, the live objects are copied to the second semi-space (the "to-space"). Memory is compacted at no cost because only live objects are copied, and they are stored linearly. The semi-spaces now switch roles: new objects are created in the second one until it is full, at which point live objects are copied to the first. The semi-spaces exchange the from-space and to-space roles again and again. Dead objects are not freed explicitly; they're simply overwritten by new objects.


In a JVM, a copying collector is a stop-the-world collector. Nevertheless, it is extremely efficient because it traverses the object list and copies the objects in a single cycle, and thus simultaneously collects the garbage and compacts the heap. The time to collect the semi-space, the "pause duration," is directly proportional to the total size of live objects.


2.2 Mark and Compact Collector


In a mark-and-compact (more briefly, "mark-compact") collector, objects are created in a contiguous space, the heap. Once the free space falls below a threshold, the collector begins a marking phase, in which it traverses all objects from the "roots," marking each as either live or dead. After the marking phase ends, the compacting phase begins, in which the collector compacts the heap by copying live objects to a new contiguous area. In a JVM, a mark-compact collector is a stop-the-world collector. It suspends all of the application threads until collection is complete and the memory is reorganized, and then restarts them.


2.3 Mark and Sweep Collector


The marking phase of a mark-and-sweep ("mark-sweep") collector is the same as that of a mark-compact collector. When the marking phase is complete, the space each dead object occupies is added to a free list. Contiguous space occupied by dead objects is combined to make larger segments of free memory, but because a mark-sweep collector does not compact the heap, memory has a tendency to fragment over time.


2.4 Incremental Collector


An incremental collector divides the heap into small fixed-size blocks and allocates the data among them. It runs only for brief periods of time, leaving more processor time available for the application's use. It collects the garbage in only one block at a time, using the train algorithm [7]. The train algorithm organizes the blocks into train-cars and trains. In each collection cycle, the collector checks the cars and trains. If the train to which a car belongs contains only garbage, then the GC collects the entire train. If a car has references from other cars, then the GC copies objects to the respective cars, and, if the destination cars are full, it creates new cars as needed. Because the incremental collector pauses the application threads for only brief periods of time, the net effect is near-pauseless application execution.


2.5 Generational Garbage Collection


In most object-oriented languages, including the Java programming language, most objects have very short lifetimes, while a small percentage of them live much longer [6]. Of all newly allocated heap objects, 80-98 percent die within a few million instructions [6]. A large percentage of the surviving objects continue to survive for many collection cycles, however, and must be analyzed and copied at each cycle. Hence, the garbage collector spends most of its time analyzing and copying the same old objects repeatedly, needlessly, and expensively.


To avoid this repeated copying, a generational GC divides the heap into multiple areas, called generations, and segregates objects into these areas by age. In younger generations objects die more quickly and the GC collects them more frequently, while in older generations objects are collected less often. Once an object survives a few collection cycles, the GC moves it to an older generation, where it will be analyzed and copied less often. This generational copying reduces GC costs.


The GC may employ different collection algorithms in different generations. In younger generations, objects are ephemeral, and both space requirements and numbers of objects needing copying tend to be small, so a copying collector is extremely efficient. In older generations, objects tend to be more numerous and longer-lived, and copying costs make copying collectors (2.2) prohibitively expensive, hence mark-compact (2.1) or mark-sweep (2.3) collectors are preferred.


2.5.1 Generational Garbage Collection in Java Applications


Generational collection was introduced to the JVM in v1.2. The heap was divided into two generations, a young generation that used two semi-spaces and a copying collector, and an old generation that used a mark-compact or mark-sweep collector. It also offered an advanced collector, the concurrent, incremental mark-and-sweep ("concurrent inc-mark-sweep") collector, (see 2.6 Advanced Collectors in Java).


In the 1.3 and later JVMs [4], the heap is again divided into generations – by default, two generations. The young generation uses a copying collector, while the old generation uses a mark-compact collector. The 1.3 JVM also offers an incremental collector, introducing an optional intermediate generation between the young and old generations. Advanced collectors, like the concurrent inc-mark-sweep collector, are not available in 1.3 but are available in 1.2.2_07, and from the 1.4.1 JVM onwards.


More information about the collectors available from Sun can be found at: http://java.sun.com/docs/hotspot/index.html.


2.6 Advanced Collectors in Java Applications


The Java platform provides an advanced collector, the concurrent inc-mark-sweep collector. A concurrent collector [1] takes advantage of its dedicated thread to enable both garbage collection and object allocation/modification to happen at the same time. It uses external bitmaps [1] to manage older-generation memory, and "card tables" [1][2] to interact with the younger-generation collector. The bitmaps are used to scan the heap and mark live objects concurrently, in a "marking phase." Once the marking is complete, the unmarked objects are deallocated in a concurrent "sweeping phase." The collector does most of its work concurrently, suspending application execution only briefly.


Note: If "the rate of creation" of objects is too high, and the concurrent collector is not able to keep up with the concurrent collection, it falls back to the traditional mark-sweep collector.


2.7 The Sun Exact VM


The research for this paper was done using the 1.2.2_08 version of the JDK (Java 2 SDK, Standard Edition), on Solaris, with the advanced concurrent inc-mark-sweep collector.


The Exact VM [15] is a high-performance Java virtual machine developed by Sun Microsystems. It features high-performance, exact memory management. The memory system is separated from the rest of the VM by a well-defined GC interface. This interface allows various garbage collectors to be "plugged in". It also features a framework to employ generational collectors. More information about VMs for Solaris can be obtained from http://www.sun.com/solaris/java.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章