JAVA Hash性能優化

1 問題描述

在JAVA代碼中有這樣一段:功能就是多個字符串拼接後,作爲map的key,put到map中。

   public void hashCode(List<String> values) {
         long start2 = System.currentTimeMillis();
         for (int i = 0; i + 1 < values.size(); i += 2) {
                StringBuilder builder = new StringBuilder();
                builder.append(values.get(i));
                builder.append(values.get(i + 1));
                
         }
         Map<String, Object> map = new HashMap<>();
                    map.put(builder.toString(),new Object());
         long end2 = System.currentTimeMillis();
         System.out.println("string hash cost :" + (end2 - start2));
   }

單個運行時,代碼的性能無法體現出來,但是到了千萬級的調用時,將會耗費很多時間。
在我的筆記本上運行(i7 HQ,8G內存),需要2-3s的時間跑完一千萬次。從理論上來講,耗費時間的在於字符串的拼接和hashcode的計算。爲了確認問題,我們先從代碼的角度找出可能出現的問題。

2 源碼分析

2.1 StringBuilder構建字符串源碼分析。

首先是初始化StringBuilder對象。初始化時,StringBuilder先用默認的大小(16)構建一個char數組。這裏只是分配一個初始化的內存,不應該佔用太多的時間。
在append的時候,如果發現申請的內存不夠,將會創建一個(原大小 + append字符串長度)2大小的空間。StringBuilder會將所有的數據都拷貝到新的空間中,然後釋放舊空間。
假如每次append的數據都是剛好達到當前的邊界,那麼空間將按照[16,17
2=34,35*2=70,142,…]的順序進行擴張。每次擴張需要消耗申請空間,複製數據的時間,同時因爲釋放了舊空間,可能會影響gc。

public final class StringBuilder
    extends AbstractStringBuilder
    implements java.io.Serializable, CharSequence
{
    public StringBuilder() {
        super(16);
    }
}

abstract class AbstractStringBuilder implements Appendable, CharSequence {
    AbstractStringBuilder(int capacity) {
        value = new char[capacity];
    }
}

public AbstractStringBuilder append(String str) {
    if (str == null)
        return appendNull();
    int len = str.length();
    ensureCapacityInternal(count + len);
    str.getChars(0, len, value, count);
    count += len;
    return this;
}
//Arrays
public static char[] copyOf(char[] original, int newLength) {
    char[] copy = new char[newLength];
    System.arraycopy(original, 0, copy, 0,
                     Math.min(original.length, newLength));
    return copy;
}

除了內存的擴張,StringBuilder本身需要將append對象的內存拷貝到自身屬性中。

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    if (srcBegin < 0) {
        throw new StringIndexOutOfBoundsException(srcBegin);
    }
    if (srcEnd > value.length) {
        throw new StringIndexOutOfBoundsException(srcEnd);
    }
    if (srcBegin > srcEnd) {
        throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
    }
    System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

從加載數據的維度來看,可能需要關注的點:1 數據長度超出申請內存,需要內存擴展;2 每次append的數據,都需要拷貝;3 返回String對象,需要再次進行內存拷貝,數據輸出到String對象中。

2.2 hash

HashMap需要通過hashCode定位存儲位置。如果存儲位置已經有數據存在,則拉出一個list,順次排放多個位置衝突的數據。
位置發生了衝突分爲多種情況:1 hashCode相同,值不同,位置相同;2 hashCode相同,值相同,位置相同;3 hashCode不同,值不同,位置相同
對於第一,三種情況,數據會依次放在list中。對於第二種情況,則會覆蓋之前的數據。
hashMap在put的時候,先行獲得key的hashCode。在hashCode相等的情況下,會通過地址相等以及equals方法進行比對。
hash的比對邏輯代碼:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;

        .
        .
        .
}

從上面的代碼可以看出,在進行put操作時,HashMap會立即計算key的hashCode,以hashCode作爲尋址的條件。如果尋址發生衝突,則hashCode作爲比對是否相等的首要條件。如果hashCode相等,則需要通過地址相等或者equals方法相等,來判斷是否相等。
所以總的來說,需要關注兩個函數:hashCode以及equals
String的hashCode算法如下。遍歷char數組的每個元素,已有數據乘以31後和新的元素相加。網上說這個算法產生衝突的概率較大,但是實際過程中不會有什麼差別。

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

String equals算法。遍歷當前char數組和比對目標的數組,挨個char進行比較。但是沒看懂的一點是:while循環採用變量n控制,但是數組元素的獲取採用變量i控制。

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = value.length;
        if (n == anotherString.value.length) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {
                if (v1[i] != v2[i])
                    return false;
                i++;
            }
            return true;
        }
    }
    return false;
}

3 hashkey實現

基於以上的分析,新作了一個對象,作爲map的主鍵。
主要從內存拷貝的方面進行了優化,只進行一次copy。
hash算法上採用FNVHash算法,參考晚上的實現。

package org.yunzhong.test.stream;
import java.util.Arrays;
public class HashKey {
   private static final int HASH_PARAM = 16777619;
   private static int HASH_INIT = (int) 2166136261L;
   
   private int hashCode;
   private char[] values;
   private int count;
   public HashKey() {
         values = new char[64];
         count = 0;
   }
   public void append(String value) {
         int minLength = 0;
         if ((minLength = value.length() + count) > values.length) {
                values = Arrays.copyOf(values, minLength * 2);
         }
         value.getChars(0, value.length(), values, count);
         count += value.length();
   }
   public void hash1() {
         for (int i = 0; i < count; ++i) {
                hashCode = 31 * hashCode + values[i];
         }
   }
   public void hash() {
         hashCode = HASH_PARAM;
         for (int i = 0; i < count; ++i) {
                hashCode = (hashCode ^ values[i]) * HASH_PARAM;
         }
         hashCode += hashCode << 13;
         hashCode ^= hashCode >> 7;
         hashCode += hashCode << 3;
         hashCode ^= hashCode >> 17;
         hashCode += hashCode << 5;
   }
   @Override
   public int hashCode() {
         if(this.hashCode == 0) {
                hash();
         }
         return hashCode;
   }
   public int getHashCode() {
         return hashCode;
   }
   public void setHashCode(int hashCode) {
         this.hashCode = hashCode;
   }
   public char[] getValues() {
         return values;
   }
   public void setValues(char[] values) {
         this.values = values;
   }
   public int getEnd() {
         return count;
   }
   public void setEnd(int end) {
         this.count = end;
   }
   @Override
   public boolean equals(Object target) {
         HashKey key = (HashKey) target;
         int length = this.count;
         if (length == key.count) {
                int i = 0;
                char[] v1 = this.values;
                char[] v2 = key.values;
                while (length-- != 0) {
                       if (v1[i] != v2[i]) {
                             return false;
                       }
                       i++;
                }
                return true;
         }
         return false;
   }
   @Override
   public String toString() {
         return String.copyValueOf(this.values, 0, count);
   }
}

4 性能比對

400萬數據測試。我的筆記本參數:(i7 HQ,8G內存)。
總的來說,平均時間會減少,但是終究無法達到倍數的提升。才疏學淺,只能止步於此。
StringBuilder測試用例

@Test
   public void testHashPut() {
         String[] characters = new String[] { "a", "b", "c", "d", "e", "f",  "j", "h", "i", "j", "k", "l", "m", "n", "o",
                       "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",  "z", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
         Random random = new Random();
         List<String> values = Lists.newArrayList();
         for (int i = 0; i < 4000000; i++) {
                StringBuilder builder = new StringBuilder();
                for (int j = 0; j < 10; j++) {
                       int nextInt = random.nextInt(34);
                       builder.append(characters[nextInt]);
                }
                values.add(builder.toString());
         }
         long start = System.currentTimeMillis();
         Map<String, Object> map = new HashMap<String, Object>();
         for (int i = 3; i < values.size(); i++) {
                StringBuilder builder = new StringBuilder();
                builder.append(values.get(i - 3));
                builder.append(values.get(i - 2));
                builder.append(values.get(i - 1));
                builder.append(values.get(i));
                map.put(builder.toString(), new Object());
         }
         System.out.println("hash init cost:" + (System.currentTimeMillis()  - start));
   }

HashKey測試用例

@Test
   public void testHashPutOnceCopy() {
         String[] characters = new String[] { "a", "b", "c", "d", "e", "f",  "j", "h", "i", "j", "k", "l", "m", "n", "o",
                       "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",  "z", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
         Random random = new Random();
         List<String> values = Lists.newArrayList();
         for (int i = 0; i < 4000000; i++) {
                StringBuilder builder = new StringBuilder();
                for (int j = 0; j < 10; j++) {
                       int nextInt = random.nextInt(34);
                       builder.append(characters[nextInt]);
                }
                values.add(builder.toString());
         }
         long start = System.currentTimeMillis();
         Map<HashKey, Object> map = new HashMap<HashKey, Object>(1000000);
         for (int i = 3; i < values.size(); i++) {
                HashKey key = new HashKey();
                key.append(values.get(i - 3));
                key.append(values.get(i - 2));
                key.append(values.get(i - 1));
                key.append(values.get(i));
                map.put(key, new Object());
         }
         System.out.println("once hash init cost:" +  (System.currentTimeMillis() - start));
   }

HashKey 2個屬性

once hash init cost:7437
once hash init cost:3588
once hash init cost:3593
once hash init cost:1599
once hash init cost:4285
once hash init cost:1597
once hash init cost:1763
once hash init cost:1607
once hash init cost:1526
once hash init cost:1519

StringBuilder 2個屬性

hash init cost:4588
hash init cost:2890
hash init cost:3226
hash init cost:2963
hash init cost:1743
hash init cost:1695
hash init cost:1729
hash init cost:1748
hash init cost:1641
hash init cost:1859

HashKey 4個屬性

once hash init cost:7561
once hash init cost:4270
once hash init cost:3726
once hash init cost:4334
once hash init cost:4330
once hash init cost:1936
once hash init cost:1914
once hash init cost:2025
once hash init cost:1926
once hash init cost:2068

StringBuilder 4個屬性

hash init cost:6841
hash init cost:3479
hash init cost:3590
hash init cost:3897
hash init cost:3676
hash init cost:4806
hash init cost:3460
hash init cost:3661
hash init cost:3512
hash init cost:3466

5 多線程

其實不想採用多線程的方式進行。多線程意味着線程間的協調,CPU資源的競爭,在系統壓力大的情況下,並不能提升什麼性能。
另外,初始化map只是一個很小的功能點,開啓多線程有種殺雞用牛刀的感覺。
最後,上百萬的數據初始化,是很少的情況。這種情況通過1s運行,或者通過10s運行,對整體的性能來說無關緊要。
但是總的來說也是一種方案,本人也在本機進行了測試。在400萬、三個字符串拼接的條件時,測試代碼和數據如下:

   private ExecutorService threadPool = Executors.newFixedThreadPool(8, new  ThreadFactory() {
         private int threadNum;
         public Thread newThread(Runnable r) {
                Thread th = new Thread(r);
                th.setName("hashThread" + threadNum++);
                return th;
         }
   });
   @Test
   public void testHashPutOnceCopyMultiTrhead() throws InterruptedException,  ExecutionException {
         String[] characters = new String[] { "a", "b", "c", "d", "e", "f",  "j", "h", "i", "j", "k", "l", "m", "n", "o",
                       "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",  "z", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
         int batch = 100000;
         Random random = new Random();
         final List<String> values = Lists.newArrayList();
         for (int i = 0; i < 4000000; i++) {
                StringBuilder builder = new StringBuilder();
                for (int j = 0; j < 10; j++) {
                       int nextInt = random.nextInt(34);
                       builder.append(characters[nextInt]);
                }
                values.add(builder.toString());
         }
         final Map<HashKey, Object> map = new ConcurrentHashMap<HashKey,  Object>(1000000);
         long start = System.currentTimeMillis();
         List<Future<Object>> futures = Lists.newArrayList();
         for (int j = 3; j < values.size(); j += batch) {
                final int bottom = j;
                final int top = values.size() > j + batch ? (j + batch) :  values.size();
                Future<Object> future = threadPool.submit(new  Callable<Object>() {
                       public Object call() throws Exception {
                             for (int i = bottom; i < top; i++) {
                                    HashKey key = new HashKey();
                                    key.append(values.get(i - 3));
                                    key.append(values.get(i - 2));
                                    key.append(values.get(i - 1));
                                    key.append(values.get(i));
                                    map.put(key, new Object());
                             }
                             return null;
                       }
                });
                futures.add(future);
         }
         for (Future<Object> future : futures) {
                future.get();
         }
         System.out.println("once hash init cost:" +  (System.currentTimeMillis() - start));
   }

測試數據

once hash init cost:7832
once hash init cost:3056
once hash init cost:2762
once hash init cost:3482
once hash init cost:3611
once hash init cost:3804
once hash init cost:1185
once hash init cost:1211
once hash init cost:1189
once hash init cost:1146
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章