java hascode 詳解

對於java對象,每個對象都可以調用方法hashCode()獲取對應的hashCode值,這個值有什麼作用?

hashcode 專門應用於java中hash table 相關的類,比如,Hashtable、HashMap等,用於幫助對象歸類到hash table 的某個桶中。

 

一、hashCode原生的返回值

首先可以看看Object中hashCode()的定義:

public native int hashCode();

這是一個native方法,即本地方法。這個native方法會返回什麼?可以做個實驗。

我們定義一個類Student,但不重寫它的hashCode方法。如下:

public class Student {

    private int age ;

    private String name;

    public Student() {}

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }

}

寫一個單元測試簡單測試一下:

@Test
public void testHashCode(){
    Student s1 = new Student(1, "a");
    Student s2 = new Student(1, "b");
    System.out.println("s1 hashcode : " + s1.hashCode());
    System.out.println("s2 hashcode : " + s2.hashCode());
}

輸出結果是:

s1 hashcode : 653305407
s2 hashcode : 1130478920

這個值是什麼意思?我們可以先看看hashCode()方法在jdk中的註釋:

/**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
    public native int hashCode();

這段話大概意思是闡明hashcode值的一般原則。同時,注意到這麼一句話:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

這句話翻譯過來,大意是,對於不同的對象,hashcoe會返回不同的整數值,這通常是將對象的地址轉化爲整數,但java編程語言並不需要這一實現技術。

 

簡單講,Object 原生的hashcode返回值,是和對象地址有關的。jdk註釋中只說了通常是將對象的地址轉化爲整數,並沒有將hashcode與對象地址劃上等號。我們只要記住這個值和對象的地址有關即可。

 

二、不重寫hashcode()

在一般開發中,我們會重寫類的hashCode方法,爲什麼?

以上面的Student爲例,定義兩個對象:

Student s1 = new Student(1, "a");
Student s2 = new Student(1, "a");

這兩個對象內容是一樣的。如果不重寫hashcode()方法,則其hashcode值是不一樣的。

那麼,兩個相同的對象,返回不同的hashCode,有什麼影響?

答案是影響了hash table 類(比如HashMap)的使用

同樣,以上面的Student作實驗,我們只重寫Student的equals方法,而不重寫hashCode方法。

public class Student {

    private int age ;

    private String name;

    public Student() {}

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }


    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Student)) return false;

        Student student = (Student) o;

        if (age != student.age) return false;
        return name != null ? name.equals(student.name) : student.name == null;
    }


    @Override
    public String toString() {
        return "Student{" +
                "age=" + age +
                ", name='" + name + '\'' +
                '}';
    }
}

編寫測試類如下:

    @Test
    public void testHashCode(){
        Student s1 = new Student(1, "a");
        Student s2 = new Student(1, "a");
        System.out.println("s1 hashcode : " + s1.hashCode());
        System.out.println("s2 hashcode : " + s2.hashCode());

        HashSet<Student> sets = new HashSet<>();
        sets.add(s1);
        sets.add(s2);
        System.out.println(sets);
        
    }

可以看到,兩個完全一樣的對象,竟然能同時加入到HashSet中(HashSet的內部實現其實是藉助HashMap的)。

按照Set的定義,是不容許這樣的情況發生的!!! 我們採用set的本意,也就是爲了防止重複對象的加入。

因此,如果不重寫自定義類的hashSet,帶來的後果就是:當需要將對象放入Hash容器中時,可能會將重複的對象加入到容器中。

 

三、重寫hashcode(),但相同類,返回值一樣

如果我們重寫hashCode()方法,但對於同一個類,我們的返回值一樣,會如何?

public class Student {

    private int age ;

    private String name;

    public Student() {}

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }

    @Override
    public int hashCode() {
        return 1;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Student)) return false;

        Student student = (Student) o;

        if (age != student.age) return false;
        return name != null ? name.equals(student.name) : student.name == null;
    }


    @Override
    public String toString() {
        return "Student{" +
                "age=" + age +
                ", name='" + name + '\'' +
                '}';
    }
}

上面的代碼,將Student的hashcode值統一返回1,這樣會有什麼情況發生?

對於這種情況,不會影響我們正常使用Hash table相關的容器,但是,使用效率上,會很低。

以HashMap的使用爲例,查看HashMap的put源碼 putVal()

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;  //如果一開始容量爲0,則擴容
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);  //如果相應的桶沒有對象,則直接插入
        else {
            Node<K,V> e; K k;
            //如果hash值相同,且桶內有結點的key值與待插入值相等,則後面只會更新該key相應的valeu值。
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //結點爲TreeNode,即後面掛的是紅黑樹
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //鏈表的情況
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //結點存在的情況
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

對於HashMap更具體的源碼分析,可以自行google一下。

在HashMap插入的過程中,判斷元素是否存在的重要判斷依據是hashCode及equals(),而且是先判斷hashCode再判斷equals()。這兩者有一個不一樣,則結點會被認爲不存在。

假如對於同一個類的對象,我們都返回一樣的hashcode,那麼,對於每一個對象的加入,我們都要對其調用 equals方法,插入的效率自然會很低。

同時,hashmap中,是根據hashcode的值進行桶的劃分的。如果hashCode值一樣,則相同的類的每個不同對象在加入後,都會發生衝突,效率是很低的。

 

參考博客:https://www.cnblogs.com/Qian123/p/5703507.html

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章