java hascode 详解

对于java对象,每个对象都可以调用方法hashCode()获取对应的hashCode值,这个值有什么作用?

hashcode 专门应用于java中hash table 相关的类,比如,Hashtable、HashMap等,用于帮助对象归类到hash table 的某个桶中。

 

一、hashCode原生的返回值

首先可以看看Object中hashCode()的定义:

public native int hashCode();

这是一个native方法,即本地方法。这个native方法会返回什么?可以做个实验。

我们定义一个类Student,但不重写它的hashCode方法。如下:

public class Student {

    private int age ;

    private String name;

    public Student() {}

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }

}

写一个单元测试简单测试一下:

@Test
public void testHashCode(){
    Student s1 = new Student(1, "a");
    Student s2 = new Student(1, "b");
    System.out.println("s1 hashcode : " + s1.hashCode());
    System.out.println("s2 hashcode : " + s2.hashCode());
}

输出结果是:

s1 hashcode : 653305407
s2 hashcode : 1130478920

这个值是什么意思?我们可以先看看hashCode()方法在jdk中的注释:

/**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
    public native int hashCode();

这段话大概意思是阐明hashcode值的一般原则。同时,注意到这么一句话:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

这句话翻译过来,大意是,对于不同的对象,hashcoe会返回不同的整数值,这通常是将对象的地址转化为整数,但java编程语言并不需要这一实现技术。

 

简单讲,Object 原生的hashcode返回值,是和对象地址有关的。jdk注释中只说了通常是将对象的地址转化为整数,并没有将hashcode与对象地址划上等号。我们只要记住这个值和对象的地址有关即可。

 

二、不重写hashcode()

在一般开发中,我们会重写类的hashCode方法,为什么?

以上面的Student为例,定义两个对象:

Student s1 = new Student(1, "a");
Student s2 = new Student(1, "a");

这两个对象内容是一样的。如果不重写hashcode()方法,则其hashcode值是不一样的。

那么,两个相同的对象,返回不同的hashCode,有什么影响?

答案是影响了hash table 类(比如HashMap)的使用

同样,以上面的Student作实验,我们只重写Student的equals方法,而不重写hashCode方法。

public class Student {

    private int age ;

    private String name;

    public Student() {}

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }


    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Student)) return false;

        Student student = (Student) o;

        if (age != student.age) return false;
        return name != null ? name.equals(student.name) : student.name == null;
    }


    @Override
    public String toString() {
        return "Student{" +
                "age=" + age +
                ", name='" + name + '\'' +
                '}';
    }
}

编写测试类如下:

    @Test
    public void testHashCode(){
        Student s1 = new Student(1, "a");
        Student s2 = new Student(1, "a");
        System.out.println("s1 hashcode : " + s1.hashCode());
        System.out.println("s2 hashcode : " + s2.hashCode());

        HashSet<Student> sets = new HashSet<>();
        sets.add(s1);
        sets.add(s2);
        System.out.println(sets);
        
    }

可以看到,两个完全一样的对象,竟然能同时加入到HashSet中(HashSet的内部实现其实是借助HashMap的)。

按照Set的定义,是不容许这样的情况发生的!!! 我们采用set的本意,也就是为了防止重复对象的加入。

因此,如果不重写自定义类的hashSet,带来的后果就是:当需要将对象放入Hash容器中时,可能会将重复的对象加入到容器中。

 

三、重写hashcode(),但相同类,返回值一样

如果我们重写hashCode()方法,但对于同一个类,我们的返回值一样,会如何?

public class Student {

    private int age ;

    private String name;

    public Student() {}

    public Student(int age, String name) {
        this.age = age;
        this.name = name;
    }

    @Override
    public int hashCode() {
        return 1;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Student)) return false;

        Student student = (Student) o;

        if (age != student.age) return false;
        return name != null ? name.equals(student.name) : student.name == null;
    }


    @Override
    public String toString() {
        return "Student{" +
                "age=" + age +
                ", name='" + name + '\'' +
                '}';
    }
}

上面的代码,将Student的hashcode值统一返回1,这样会有什么情况发生?

对于这种情况,不会影响我们正常使用Hash table相关的容器,但是,使用效率上,会很低。

以HashMap的使用为例,查看HashMap的put源码 putVal()

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;  //如果一开始容量为0,则扩容
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);  //如果相应的桶没有对象,则直接插入
        else {
            Node<K,V> e; K k;
            //如果hash值相同,且桶内有结点的key值与待插入值相等,则后面只会更新该key相应的valeu值。
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //结点为TreeNode,即后面挂的是红黑树
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //链表的情况
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //结点存在的情况
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

对于HashMap更具体的源码分析,可以自行google一下。

在HashMap插入的过程中,判断元素是否存在的重要判断依据是hashCode及equals(),而且是先判断hashCode再判断equals()。这两者有一个不一样,则结点会被认为不存在。

假如对于同一个类的对象,我们都返回一样的hashcode,那么,对于每一个对象的加入,我们都要对其调用 equals方法,插入的效率自然会很低。

同时,hashmap中,是根据hashcode的值进行桶的划分的。如果hashCode值一样,则相同的类的每个不同对象在加入后,都会发生冲突,效率是很低的。

 

参考博客:https://www.cnblogs.com/Qian123/p/5703507.html

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章