GenCollectedHeap的Gc策略MarkSweepPolicy

當內存堆管理器響應應用線程的一次內存分配請求失敗時(就是沒有足夠的空閒內存),就會觸發一次Gc來回收部分或所有的垃圾對象好騰出足夠的空間來滿足應用的所需內存,如果還不夠則拋出OOM. MarkSweepPolicy的大體思路就是標記active的對象, 清理未被標記(非active)的對象MarkSweepPolicy作爲內存堆管理器GenCollectedHeap的默認配置Gc策略, 針對這種內存堆分代管理的內存堆管理器, 它的優化思路就是把Gc的範圍儘量控制在年青代,這是因爲清理年青代的非active對象很容易:將年青代的Eden/From區中所有active對象轉存到它的To區或舊生代中,然後清理整個Eden/From區就可以;而清理舊生代的非active對象就沒有那麼容易了,畢竟沒有其它的內存空間來給舊生代中的active對象轉存儲了,也就只能通過移動壓縮的方式來騰出空閒空間了(所謂的內存碎片處理).

年青代Gc(Minor Gc)示意圖

舊生代Gc示意圖

先來回顧一下GenCollectedHeap是如何觸發Gc的:

    //觸發一次Gc操作,將GC型JVM操作加入VMThread的操作隊列中
    //Gc的真正執行是由VMThread或特型GC線程來完成的
    VM_GenCollectForAllocation op(size, is_tlab, gc_count_before);
    VMThread::execute(&op);

    if (op.prologue_succeeded()) {	//一次Gc操作已完成
      result = op.result();
      if (op.gc_locked()) {	//當前線程沒有成功觸發GC(可能剛被其它線程觸發了),則繼續重試分配
         assert(result == NULL, "must be NULL if gc_locked() is true");
         continue;  // retry and/or stall as necessary
      }

      //到目前爲止由於內存分配失敗觸發了一次Gc,並且Gc已被做完.如果此時GC超時(儘管Gc之後分配成功了)則給上層返回分配失敗,
      //由上層給應用程序的調用線程拋出OOM.同時清理掉GC超時的標記

      //本次Gc耗時是否超過了設置的GC時間上限
      const bool limit_exceeded = size_policy()->gc_overhead_limit_exceeded();
      const bool softrefs_clear = all_soft_refs_clear();

      //本次GC超時一定是進行了清除軟引用的操作
      assert(!limit_exceeded || softrefs_clear, "Should have been cleared");

      //Gc超時
      if (limit_exceeded && softrefs_clear) {
        *gc_overhead_limit_was_exceeded = true;
        //清理Gc超時標記
        size_policy()->set_gc_overhead_limit_exceeded(false);
        if (op.result() != NULL) {
          CollectedHeap::fill_with_object(op.result(), size);
        }
        //Gc超時，給上層調用返回NULL,讓其拋出內存溢出錯誤
        return NULL;
      }

      //分配成功則確保該內存塊一定在內存堆中
      assert(result == NULL || gch->is_in_reserved(result), "result not in heap");
      return result;

這裏值得一提的是, 所有的Java級應用線程向內存堆管理器申請內存空間失敗,會創建一個JVM操作命令VM_GenCollectForAllocation並將其丟給VMThread線程去調度執行(本文暫時還不會講解VMThread線程是如何調度這個JVM操作命令的),而自己並不去執行Gc,只是一直監聽等待其完成.VM_GenCollectForAllocation這個操作命令所代表的含義就是一次內存分配失敗需要作特殊處理(要麼執行一次Gc,要麼擴展內存堆或內存代的內存空間,反正就要倒騰出空間來滿足上層的內存需求),其具體操作邏輯如下:

void VM_GenCollectForAllocation::doit() {
  SvcGCMarker sgcm(SvcGCMarker::MINOR);

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, _gc_cause);

  //通知內存堆管理器處理一次內存分配失敗
  _res = gch->satisfy_failed_allocation(_size, _tlab);

  //確保分配的內存塊在內存堆中
  assert(gch->is_in_reserved_or_null(_res), "result not in heap");

  if (_res == NULL && GC_locker::is_active_and_needs_gc()) {
    set_gc_locked();
  }
}

最終,內存堆管理器GenCollectedHeap還是會調用l垃圾回收器來處理, 而MarkSweepPolicy處理一次內存申請失敗的總體思路就是選來一個合適的Gc來倒騰出足夠的內存空間滿足出發本次Gc的應用線程所需的存儲空間即可,至於內存堆管理器是如何執行這個Gc的,它並不關心.MarkSweepPolicy處理一次內存申請失敗的核心策略是:

1. Gc類型選擇
   1).如果Gc操作已被觸發但還無法被執行,則放棄本次Gc操作
   2).如果執行增量式Gc(就是隻對年青代進行垃圾回收)安全,則執行一次MinorGc
   3).只能執行一次Full Gc
2. 從年青代-老年代依次嘗試分配內存塊
3. 從老年代-年青代依次擴展內存容量嘗試分配內存塊
4. 執行一次徹底的Full Gc(清理所有的軟引用)
5. 從年青代-老年代依次嘗試分配內存塊

具體實現細節參考代碼:

/**
 * 處理上層應用線程(Java級線程)的一次內存申請失敗
 */
HeapWord* GenCollectorPolicy::satisfy_failed_allocation(size_t size,
                                                        bool   is_tlab) {
  GenCollectedHeap *gch = GenCollectedHeap::heap();
  GCCauseSetter x(gch, GCCause::_allocation_failure);
  HeapWord* result = NULL;

  assert(size != 0, "Precondition violated");

  if (GC_locker::is_active_and_needs_gc()) {	//Gc操作已被觸發但還無法被執行

    if (!gch->is_maximal_no_gc()) { // 當前有內存代允許擴展內存容量,則試圖通過擴展內存代的容量來分配內存塊
      result = expand_heap_and_allocate(size, is_tlab);
    }

    return result;   // could be null if we are out of space

  } else if (!gch->incremental_collection_will_fail(false /* don't consult_young */)) { //如果當前增量式可行,則只觸發一個Minor Gc
    //增量式GC()
    gch->do_collection(false            /* full */,
                       false            /* clear_all_soft_refs */,
                       size             /* size */,
                       is_tlab          /* is_tlab */,
                       number_of_generations() - 1 /* max_level */);
  } else {	//執行一次Full Gc
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print(" :: Trying full because partial may fail :: ");
    }
    // Try a full collection; see delta for bug id 6266275
    // for the original code and why this has been simplified
    // with from-space allocation criteria modified and
    // such allocation moved out of the safepoint path.
    gch->do_collection(true             /* full */,
                       false            /* clear_all_soft_refs */,
                       size             /* size */,
                       is_tlab          /* is_tlab */,
                       number_of_generations() - 1 /* max_level */);
  }

  //執行一次Gc之後，再次從內存堆的各個內存代中依次分配指定大小的內存塊
  result = gch->attempt_allocation(size, is_tlab, false /*first_only*/);

  if (result != NULL) {
    assert(gch->is_in_reserved(result), "result not in heap");
    return result;
  }

  //執行一次Gc之後可能有剩餘的空間來擴展各內存代的容量，
  //所以再次嘗試通過允許擴展內存代容量的方式來試圖分配指定大小的內存塊
  result = expand_heap_and_allocate(size, is_tlab);
  if (result != NULL) {
    return result;
  }

  // If we reach this point, we're really out of memory. Try every trick
  // we can to reclaim memory. Force collection of soft references. Force
  // a complete compaction of the heap. Any additional methods for finding
  // free memory should be here, especially if they are expensive. If this
  // attempt fails, an OOM exception will be thrown.
  {
    IntFlagSetting flag_change(MarkSweepAlwaysCompactCount, 1); // Make sure the heap is fully compacted

    //最後再進行一次徹底的Gc: 回收所有的內存代+清除軟引用
    gch->do_collection(true             /* full */,
                       true             /* clear_all_soft_refs */,
                       size             /* size */,
                       is_tlab          /* is_tlab */,
                       number_of_generations() - 1 /* max_level */);
  }

  //經過一次徹底的Gc之後，最後一次嘗試依次從各內存代分配指定大小的內存塊
  result = gch->attempt_allocation(size, is_tlab, false /* first_only */);
  if (result != NULL) {
    assert(gch->is_in_reserved(result), "result not in heap");
    return result;
  }

  assert(!should_clear_all_soft_refs(), "Flag should have been handled and cleared prior to this point");

  // What else?  We might try synchronous finalization later.  If the total
  // space available is large enough for the allocation, then a more
  // complete compaction phase than we've tried so far might be
  // appropriate.
  return NULL;
}

對於GenCollectedHeap這種基於內存分代管理的內存堆管理器而言,它們回收內存堆中垃圾對象時所追求的永恆主題就是將垃圾對象的回收儘量控制在年青的內存代,因爲這樣做代價小,成效高,所以會優先回收年青代中的垃圾對象.具體策略是:

1.確定本次Gc是否清除軟/弱引用(java.lang.ref包)
2.確定本次GC參與的內存代
3.按照從老到青的順序對這些的內存代進行Gc
4.調整這些內存代的大小

GenCollectedHeap實現do_collection方法的細節如下:

/**
 * 執行一次GC
 *
 * @param full 執行Full Gc還是Minor Gc(增量式Gc)
 * @param clear_all_soft_refs 本次Gc是否需要清理所有的軟引用(也由內存堆分配回收策略決定)
 * @param size 本次Gc之後待分配的內存塊大小
 * @param is_tlab 本次Gc之後是否從線程的本地緩衝區中分配內存塊
 * @param max_level 本次Gc中允許回收的最老內存代
 */
void GenCollectedHeap::do_collection(bool  full,
                                     bool   clear_all_soft_refs,
                                     size_t size,
                                     bool   is_tlab,
                                     int    max_level) {
  bool prepared_for_verification = false;
  ResourceMark rm;
  DEBUG_ONLY(Thread* my_thread = Thread::current();)

  /**
   * 執行GC操作的線程必須滿足四個條件
   * 	1.在一個同步安全點
   * 	2.VM線程或專用GC線程
   * 	3.內存堆的全局鎖被GC操作的請求線程取得了
   * 	4.其它合法線程還沒有開始進行GC處理
   */
  assert(SafepointSynchronize::is_at_safepoint(), "should be at safepoint");
  assert(my_thread->is_VM_thread() || my_thread->is_ConcurrentGC_thread(), "incorrect thread type capability");
  assert(Heap_lock->is_locked(), "the requesting thread should have the Heap_lock");
  guarantee(!is_gc_active(), "collection is not reentrant");

  assert(max_level < n_gens(), "sanity check");

  //GC操作當前被禁止
  if (GC_locker::check_active_before_gc()) {
    return; // GC is disabled (e.g. JNI GetXXXCritical operation)
  }

  //本次Gc是否要清理所有的軟引用
  const bool do_clear_all_soft_refs = clear_all_soft_refs || collector_policy()->should_clear_all_soft_refs();

  ClearedAllSoftRefs casr(do_clear_all_soft_refs, collector_policy());

  //當前永久代的使用量
  const size_t perm_prev_used = perm_gen()->used();

  print_heap_before_gc();
  if (Verbose) {
    gclog_or_tty->print_cr("GC Cause: %s", GCCause::to_string(gc_cause()));
  }

  {
    FlagSetting fl(_is_gc_active, true);	//當前線程正式開始GC

    //當前是否要進行一個Full Gc
    bool complete = full && (max_level == (n_gens()-1));

    //Gc類型(Minor/Full GC)
    const char* gc_cause_str = "GC ";
    if (complete) {
      GCCause::Cause cause = gc_cause();
      if (cause == GCCause::_java_lang_system_gc) {	//應用程序主動調用System.gc()觸發
        gc_cause_str = "Full GC (System) ";
      } else {
        gc_cause_str = "Full GC ";
      }
    }
    gclog_or_tty->date_stamp(PrintGC && PrintGCDateStamps);

    //統計本次Gc的CPU時間
    TraceCPUTime tcpu(PrintGCDetails, true, gclog_or_tty);
    TraceTime t(gc_cause_str, PrintGCDetails, false, gclog_or_tty);

    //一次Gc操作的前置處理
    gc_prologue(complete);

    increment_total_collections(complete);	//更新Gc計數器

    //當前內存堆的總使用量
    size_t gch_prev_used = used();

    //確定收集那些內存代
    int starting_level = 0;
    if (full) {
      // Search for the oldest generation which will collect all younger
      // generations, and start collection loop there.
      //如果是當前進行的是Full GC,則從最老的內存代開始向前搜索，找到第一個可收集所有年青代的內存代
      for (int i = max_level; i >= 0; i--) {
        if (_gens[i]->full_collects_younger_generations()) {
          starting_level = i;
          break;
        }
      }
    }

    bool must_restore_marks_for_biased_locking = false;

    //本次Gc回收的最年青的內存代
    int max_level_collected = starting_level;

    for (int i = starting_level; i <= max_level; i++) {
      if (_gens[i]->should_collect(full, size, is_tlab)) {	//是否回收當前的內存代
    	//如果當前回收的內存代是最老代，則本次GC升級爲Full Gc
        if (i == n_gens() - 1) {  // a major collection is to happen
          if (!complete) {
            // The full_collections increment was missed above.
            increment_total_full_collections();
          }
          pre_full_gc_dump();    // do any pre full gc dumps
        }

        //統計當前內存代本次Gc的時間消耗信息,Gc次數,內存變化信息
        TraceTime t1(_gens[i]->short_name(), PrintGCDetails, false, gclog_or_tty);
        TraceCollectorStats tcs(_gens[i]->counters());
        TraceMemoryManagerStats tmms(_gens[i]->kind(),gc_cause());

        //Gc之前該內存代的使用量
        size_t prev_used = _gens[i]->used();
        _gens[i]->stat_record()->invocations++;
        _gens[i]->stat_record()->accumulated_time.start();

        // Must be done a new before each collection because
        // a previous collection will do mangling and will
        // change top of some spaces.
        record_gen_tops_before_GC();

        if (PrintGC && Verbose) {
          gclog_or_tty->print("level=%d invoke=%d size=" SIZE_FORMAT,
                     i,
                     _gens[i]->stat_record()->invocations,
                     size*HeapWordSize);
        }

        if (VerifyBeforeGC && i >= VerifyGCLevel && total_collections() >= VerifyGCStartAt) {
          HandleMark hm;  // Discard invalid handles created during verification
          if (!prepared_for_verification) {
            prepare_for_verify();
            prepared_for_verification = true;
          }
          gclog_or_tty->print(" VerifyBeforeGC:");
          Universe::verify(true);
        }
        COMPILER2_PRESENT(DerivedPointerTable::clear());

        if (!must_restore_marks_for_biased_locking &&
            _gens[i]->performs_in_place_marking()) {
          // We perform this mark word preservation work lazily
          // because it's only at this point that we know whether we
          // absolutely have to do it; we want to avoid doing it for
          // scavenge-only collections where it's unnecessary
          must_restore_marks_for_biased_locking = true;
          BiasedLocking::preserve_marks();
        }

        //正式開始回收當前的內存代
        {
          // Note on ref discovery: For what appear to be historical reasons,
          // GCH enables and disabled (by enqueing) refs discovery.
          // In the future this should be moved into the generation's
          // collect method so that ref discovery and enqueueing concerns
          // are local to a generation. The collect method could return
          // an appropriate indication in the case that notification on
          // the ref lock was needed. This will make the treatment of
          // weak refs more uniform (and indeed remove such concerns
          // from GCH). XXX

          printf("%s[%d] [tid: %lu]: 開始回收內存代[%d: %s]...\n", __FILE__, __LINE__, pthread_self(), i, _gens[i]->name());

          HandleMark hm;  // Discard invalid handles created during gc
          save_marks();   // save marks for all gens

          // We want to discover references, but not process them yet.
          // This mode is disabled in process_discovered_references if the
          // generation does some collection work, or in
          // enqueue_discovered_references if the generation returns
          // without doing any work.
          ReferenceProcessor* rp = _gens[i]->ref_processor();

          // If the discovery of ("weak") refs in this generation is
          // atomic wrt other collectors in this configuration, we
          // are guaranteed to have empty discovered ref lists.
          if (rp->discovery_is_atomic()) {
            rp->enable_discovery(true /*verify_disabled*/, true /*verify_no_refs*/);
            rp->setup_policy(do_clear_all_soft_refs);
          } else {
            // collect() below will enable discovery as appropriate
          }

          printf("%s[%d] [tid: %lu]: 開始回收內存代[%d: %s]的內存垃圾(full=%s, do_clear_all_soft_refs=%s)...\n", __FILE__, __LINE__, pthread_self(),
        		  i, _gens[i]->name(), full? "true":"false", do_clear_all_soft_refs? "true":"false");

          //正式回收當前的內存代
          _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);

          if (!rp->enqueuing_is_done()) {
            rp->enqueue_discovered_references();
          } else {
            rp->set_enqueuing_is_done(false);
          }
          rp->verify_no_references_recorded();
        }

        max_level_collected = i;

        //當前內存代的GC之後,能否滿足內存分配請求
        if (size > 0) {
          if (!is_tlab || _gens[i]->supports_tlab_allocation()) {
            if (size*HeapWordSize <= _gens[i]->unsafe_max_alloc_nogc()) {
              size = 0;
            }
          }
        }

        COMPILER2_PRESENT(DerivedPointerTable::update_pointers());

        _gens[i]->stat_record()->accumulated_time.stop();

        update_gc_stats(i, full);

        if (VerifyAfterGC && i >= VerifyGCLevel && total_collections() >= VerifyGCStartAt) {
          HandleMark hm;  // Discard invalid handles created during verification
          gclog_or_tty->print(" VerifyAfterGC:");
          Universe::verify(false);
        }

        if (PrintGCDetails) {
          gclog_or_tty->print(":");
          _gens[i]->print_heap_change(prev_used);
        }
      }
    }//for

    //本次是否是一次Full Gc
    complete = complete || (max_level_collected == n_gens() - 1);

    if (complete) { // We did a "major" collection
      post_full_gc_dump();   // do any post full gc dumps
    }

    //打印出本次Gc之後,內存堆的變化情況,如果本次是一次Full Gc,則打印出永久代的內存變化情況
    if (PrintGCDetails) {
      print_heap_change(gch_prev_used);

      if (complete) {
        print_perm_heap_change(perm_prev_used);
      }
    }

    //一次Gc之後調整內存堆中各內存代的大小
    for(int j = max_level_collected; j >= 0; j -= 1) {
      // Adjust generation sizes.
      printf("%s[%d] [tid: %lu]: 試圖調整內存代[%d: %s]的大小.\n", __FILE__, __LINE__, pthread_self(), j, _gens[j]->name());
      _gens[j]->compute_new_size();
    }

    //一次Full Gc之後調整永久代大小
    if (complete) {
      // Ask the permanent generation to adjust size for full collections
      printf("%s[%d] [tid: %lu]: 試圖調整永久代[%s]的大小(只有在一次Full Gc之後纔會調整永久代大小).\n", __FILE__, __LINE__, pthread_self(), perm()->as_gen()->name());
      perm()->compute_new_size();
      update_full_collections_completed();
    }

    // Track memory usage and detect low memory after GC finishes
    MemoryService::track_memory_usage();

    //一次Gc操作的後置處理
    gc_epilogue(complete);

    if (must_restore_marks_for_biased_locking) {
      BiasedLocking::restore_marks();
    }
  }

  //打印各內存代大小的調整信息
  AdaptiveSizePolicy* sp = gen_policy()->size_policy();
  AdaptiveSizePolicyOutput(sp, total_collections());

  print_heap_after_gc();

#ifdef TRACESPINNING
  ParallelTaskTerminator::print_termination_counts();
#endif

  //Gc的總次數超過配置則終止整個JVM進程
  if (ExitAfterGCNum > 0 && total_collections() == ExitAfterGCNum) {
    tty->print_cr("Stopping after GC #%d", ExitAfterGCNum);
    vm_exit(-1);
  }
}

至於每個內存代管理器是如何回收內存代內垃圾對象以及又是如何調整本代內存大小的, 後文會分別作詳細的介紹.

GenCollectedHeap的Gc策略MarkSweepPolicy

Wireshark 安裝+使用（一）

JobTracker節點後臺線程之CompleteJobStatusStore

Job的任務執行流程之JobSetup階段

TaskTracker節點上的內存管理器

HDFS小文件問題及解決方案

JobTracker節點後臺線程之RetireJobs

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結