1. 簡介
十一放假期間在脈脈上看見一道面試題討論的很火熱:
Java中字符串是如何存儲的?
這一問題看似簡單,但是背後卻隱藏了很多深層機制,本文將逐一介紹相關技術原理。
2. 字符串類
字符串廣泛應用於Java編程中,在Java中字符串屬於對象,Java提供了String 類來創建和操作字符串。
2.1 java.lang.String
java.lang.String成員變量如下:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
/** Cache the hash code for the string */
private int hash; // Default to 0
/** use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = -6849794470754667710L;
String類包含了兩個私有的final變量,int hash用於緩存hash值,char數組用於儲存數據。
2.2 char
Java運行時,char按UTF-16編碼,一個char需要佔用兩個字節(生僻字除外)。
2.3 數組
JVM中數組相關的類主要包括:
ArrayKlass
arrayOopDesc
分別對應與類的元數據和類的實例數據,ArrayKlass和arrayOopDesc分別是Klass和oopDesc的子類,也就意味着,Java中數組同樣也是一類對象。
以及他們的子類,TypeArrayKlass和typeArrayOopDesc用來描述基本類型數組,而ObjArrayKlass和objArrayOopDesc用來描述對象數組。
TypeArrayKlass
typeArrayOopDesc
ObjArrayKlass
objArrayOopDesc
class ArrayKlass: public Klass {
friend class VMStructs;
private:
// If you add a new field that points to any metaspace object, you
// must add this field to ArrayKlass::metaspace_pointers_do().
int _dimension; // 數組的維度
Klass* volatile _higher_dimension; // 數組元素的Klass描述
Klass* volatile _lower_dimension; //
}
// arrayOopDesc類主要負責維護下面的信息
//
// 管理對象頭
// 指向Klass的指針
// 數組長度
class arrayOopDesc : public oopDesc {
}
答出以上這些,應付我們的面試題勉強是夠了,但是由於字符串是運行時大量使用的對象,JVM針對字符串進行了大量的優化,主要有String.intern()方法和G1的字符串去重。
3. String.intern
Java引用了String.intern()方法來解決字符串冗餘的問題。開發者需要顯式調用該方法,該方法會將字符串對象存儲到一個StringTable哈希表中,具體實現如下:
java.lang.String
// 這是一個native方法,通過JNI調用到c代碼
public native String intern();
#include "jvm.h"
#include "java_lang_String.h"
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
// 調用了JVM_InternString方法
return JVM_InternString(env, this);
}
JNIEXPORT jboolean JNICALL
Java_java_lang_StringUTF16_isBigEndian(JNIEnv *env, jclass cls)
{
unsigned int endianTest = 0xff000000;
if (((char*)(&endianTest))[0] != 0) {
return JNI_TRUE;
} else {
return JNI_FALSE;
}
}
// String support ///////////////////////////////////////////////////////////////////////////
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
JvmtiVMObjectAllocEventCollector oam;
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
// 調用StringTable的intern方法
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
// 存放字符串緩存的哈希表
static CompactHashtable<
const jchar*, oop,
read_string_from_compact_hashtable,
java_lang_String::equals
> _shared_table;
oop StringTable::intern(Handle string_or_null_h, const jchar* name, int len, TRAPS) {
// 獲取字符串的hash code
unsigned int hash = java_lang_String::hash_code(name, len);
// 根據hash code、char數組、長度在哈希表中查找是否已經存在
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
// 如在哈希表中已經存在則直接返回
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
// 如果在哈希表中不存在,則調用do_intern方法,將字符串對象緩存入哈希表中
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
hash, CHECK_NULL);
}
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;
if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}
// Deduplicate the string before it is interned. Note that we should never
// deduplicate a string after it has been interned. Doing so will counteract
// compiler optimizations done on e.g. interned string literals.
Universe::heap()->deduplicate_string(string_h());
assert(java_lang_String::equals(string_h(), name, len),
"string must be properly initialized");
assert(len == java_lang_String::length(string_h()), "Must be same length");
StringTableLookupOop lookup(THREAD, hash, string_h);
StringTableGet stg(THREAD);
bool rehash_warning;
do {
if (_local_table->get(THREAD, lookup, stg, &rehash_warning)) {
update_needs_rehash(rehash_warning);
return stg.get_res_oop();
}
WeakHandle<vm_string_table_data> wh = WeakHandle<vm_string_table_data>::create(string_h);
// The hash table takes ownership of the WeakHandle, even if it's not inserted.
if (_local_table->insert(THREAD, lookup, wh, &rehash_warning)) {
update_needs_rehash(rehash_warning);
return wh.resolve();
}
} while(true);
}
4. G1的字符串去重
爲了降低內存的使用,JVM能夠自動優化字符串對象,如果字符串對象的char[]數組重複,則JVM後臺自動的將其指向同一段內存地址。G1會在YGC和Full GC的標記階段執行該邏輯。該特性是JEP 192引入的。
字符串去重與String.intern()存在兩點區別:
- String.intern()需要顯式調用,而字符串去重是JVM自動執行的
- String.intern()共享的是字符串對象,而字符串去重共享的是char[]
// 標記時,判斷是否字符串去重的候選者
bool G1StringDedup::is_candidate_from_mark(oop obj) {
if (java_lang_String::is_instance_inlined(obj)) {
bool from_young = G1CollectedHeap::heap()->heap_region_containing(obj)->is_young();
// 源Region屬於新生代,且對象年齡小於閾值,返回TRUE
if (from_young && obj->age() < StringDeduplicationAgeThreshold) {
// Candidate found. String is being evacuated from young to old but has not
// reached the deduplication age threshold, i.e. has not previously been a
// candidate during its life in the young generation.
return true;
}
}
// Not a candidate
return false;
}
// 疏散時,判斷是否字符串去重的候選者
bool G1StringDedup::is_candidate_from_evacuation(bool from_young, bool to_young, oop obj) {
if (from_young && java_lang_String::is_instance_inlined(obj)) {
// 源Region屬於新生代,目的地Region屬於新生代,且對象年齡等於閾值
if (to_young && obj->age() == StringDeduplicationAgeThreshold) {
// Candidate found. String is being evacuated from young to young and just
// reached the deduplication age threshold.
return true;
}
// 源Region屬於新生代,目的地Region屬於老年代,且對象年齡小於閾值;Full GC時,所有Region都被標記爲老年代
if (!to_young && obj->age() < StringDeduplicationAgeThreshold) {
// Candidate found. String is being evacuated from young to old but has not
// reached the deduplication age threshold, i.e. has not previously been a
// candidate during its life in the young generation.
return true;
}
}
當判斷爲candidate後,先將對象寫入一個臨時隊列,由另外一個線程處理字符串去重。
//
// Task for parallel unlink_or_oops_do() operation on the deduplication queue
// and table.
//
class G1StringDedupUnlinkOrOopsDoTask : public AbstractGangTask {
private:
G1StringDedupUnlinkOrOopsDoClosure _cl;
G1GCPhaseTimes* _phase_times;
public:
G1StringDedupUnlinkOrOopsDoTask(BoolObjectClosure* is_alive,
OopClosure* keep_alive,
bool allow_resize_and_rehash,
G1GCPhaseTimes* phase_times) :
AbstractGangTask("G1StringDedupUnlinkOrOopsDoTask"),
_cl(is_alive, keep_alive, allow_resize_and_rehash), _phase_times(phase_times) { }
virtual void work(uint worker_id) {
{
G1GCParPhaseTimesTracker x(_phase_times, G1GCPhaseTimes::StringDedupQueueFixup, worker_id);
StringDedupQueue::unlink_or_oops_do(&_cl);
}
{
G1GCParPhaseTimesTracker x(_phase_times, G1GCPhaseTimes::StringDedupTableFixup, worker_id);
StringDedupTable::unlink_or_oops_do(&_cl, worker_id);
}
}
};