另談GetHashCode函數



另談GetHashCode函數

第一談:




( Figure 1-1)


 


( Figure 1-1)所示,對於實現 hash算法的集合, HashSet<T>,假設會將 hash值對應的區域分爲"32"個區域,集合在尋找對象的時候,首先,會根據自身的 hashcode % 32,所得的值去相對於的區域尋找對象.這顯然提高了查詢的效率.            當然,對於沒有實現 hash算法的集合,實現GetHashCode()方法是沒有意義的.

              話說回來,爲什麼在許多情況下,當我們重寫了 Equals()方法時,編譯器會提示我們同時也重寫 GetHashCode()方法?

              試想,當我們去添加一個對象(此時我們只是重寫了 Equals()方法,沒有重寫GetHashCode()方法),這時會有兩種情況,其一是在"已有"和當前對象相同的區域尋找,此時,因爲對象重複,無法添加(因爲我們重寫了Equals()方法);           其二,不在那個區域查找,也就是說,在兩個不同的區域查找,此時可以再添加(因爲在不同的區域查找.   

              所以說,很多時候,編譯器會提示我們在重寫Equals ()方法的時候,同時也重寫GetHashCode()方法.從這裏也可以看出,對於沒有實現 hash算法的集合,重寫GetHashCode()方法是沒有意義的.(因爲只有 hash算法纔將其分域).


 


       classPoint{

       privateint _x; //橫座標.

       publicint X{
           get{return _x;}
           set{ _x= value;}
       }
       privateint _y; //縱座標.

       publicint Y{
           get{return _y;}
           set{ _y= value;}
       }

       publicPoint(int x,int y){
           this._x= x;
           this._y= y;
       }

       //override theObject's Equals() Method.
       publicoverrideboolEquals(object obj){
           if(obj==null)thrownewNullReferenceException("Point");
           Point another = objasPoint;
           returnthis._x== another._x&&this._y== another._y;
       }

       //override theObject's GetHashCode() Method.
       publicoverrideintGetHashCode(){
           return X.GetHashCode()^ Y.GetHashCode();
       }
   }

       //ProgramMain方法中:

       class Program {

       static void Main(string[] args) {
           //HashSet(實現hash算法).
           HashSet<Point> points =newHashSet<Point>();

           Point p1 =newPoint(1,1);
           Point p2 =newPoint(2,2);
           Point p3 =newPoint(3,3);

           points.Add(p1);
           points.Add(p2);
           points.Add(p3);
           Console.WriteLine(points.Count);

           //添加重複值的Point.
           Point p4 =newPoint(2,2);
           points.Add(p4);

           Console.WriteLine(points.Count);
           //Point類未重寫自己的 GetHashCode()方法事,output: 4.
           //Point類重寫自己的 GetHashCode()方法後, output: 3.

           p1.X=0;  //修改參與計算hash值的字段.
           points.Remove(p1);
           //如果沒有"修改參與計算hash值的字段",output 2;
           //否則 output: 3 (即無法刪除).
           Console.WriteLine(points.Count);

           Console.ReadKey();
       }
   }

              如上測試,Main方法中,我們對一個對象(p1)存儲到hash集合後,去修改參與hash計算的字段(我們在Point的重寫 GetHashCode()方法涉及到 X字段),發現無法刪除.

              注意,當一個對象存儲到 hash集合後,就不能修改這個對象中參與計算的hash字段了;否則,對象修改後的hashcode與最初存儲進hash集合中的hashcode就不同了.

              在這種情況下,即使在 Contains()方法使用該對象的當前引用作爲參數區hash集合中檢索對象也無法找到對象.這也會導致無法從hash集合中單獨刪除當前對象,從而造成內存泄露


 



第二談:


要實現對象的相等比較,需要實現IEquatable<T>,或單獨寫一個類實現IEqualityComparer<T>接口。

像List<T>的Contains這樣的函數,如果我們自己定義的對象不實現IEquatable<T>接口,這個函數會默認調用object的Equels來比較對象,得出非預期的結果。

先自定義一個類:

public class DaichoKey
{
    public int ID { get;set; }
    public int SubID { get;set; }
}
List<DaichoKey> lst = new List<DaichoKey>() {
new DaichoKey(){ID = 1,SubID =2},
new DaichoKey(){ID = 1,SubID = 3}
};           
var newItem = new DaichoKey() { ID = 1, SubID = 2 };
bool isContains = lst.Contains(newItem);//false

 上面的代碼調用Contains後得到false,我們預想1和2的對象都已經存在了,應該得到true纔對呀。

要實現這個效果,需要實現IEquatable<T>接口。

public class DaichoKey : IEquatable<DaichoKey>
{
    public int ID { get;set; }
    public int SubID { get;set; }
 
    public bool Equals(DaichoKey other)
    {
        return this.ID == other.ID && this.SubID == other.SubID;
    }
}

經過上面的改良,結果如我們預期了,但是還不夠完善,微軟建議我們重寫object的Equels方法我GetHashCode方法,以保持語義的一致性,於是有了下面的代碼:

public class DaichoKey : IEquatable<DaichoKey>
{
    public int ID { get;set; }
    public int SubID { get;set; }
 
    public bool Equals(DaichoKey other)
    {
        return this.ID == other.ID && this.SubID == other.SubID;
    }
    public override bool Equals(object obj)
    {
        if (obj == null)return base.Equals(obj);
 
        if (objis DaichoKey)
            return Equals(objas DaichoKey);
        else
            throw new InvalidCastException("the 'obj' Argument is not a DaichoKey object");
    }
    public override int GetHashCode()
    {
        return base.GetHashCode();//return object's hashcode
    }
}

 上面的代碼依然還有缺陷,沒重寫==和!=運算符,但這不是本文討論的重點。繞了一大圈,終於來到了GetHashCode函數身上,貌似他對我們的Contains函數沒有啥影響呀,不重寫又何妨?我們再來試試List<T>的一個擴展函數Distinct: 

List<DaichoKey> lst = new List<DaichoKey>() {
new DaichoKey(){ID = 1,SubID =2},
new DaichoKey(){ID = 1,SubID = 3}
};
var newItem = new DaichoKey() { ID = 1, SubID = 2 };
lst.Add(newItem);
if (lst != null)
{
    lst = lst.Distinct<DaichoKey>().ToList();
}
//result:
//1 2
//1 3
//1 2

 悲劇發生了,數據1,2的重複數據沒有被去掉呀,我們不是實現了IEquatable<T>接口接口嗎。在園子上找到了一篇文章(

c# 擴展方法奇思妙用基礎篇八:Distinct 擴展),在回覆中提到要將GetHashCode返回固定值,以強制調用IEquatable<T>的Equels方法。如下:

public class DaichoKey : IEquatable<DaichoKey>
{
    public int ID { get;set; }
    public int SubID { get;set; }
 
    public bool Equals(DaichoKey other)
    {
        return this.ID == other.ID && this.SubID == other.SubID;
    }
    public override bool Equals(object obj)
    {
        if (obj == null)return base.Equals(obj);
 
        if (objis DaichoKey)
            return Equals(objas DaichoKey);
        else
            throw new InvalidCastException("the 'obj' Argument is not a DaichoKey object");
    }
    public override int GetHashCode()
    {
        return 0;//base.GetHashCode();
    }
}

 結果立馬就對了,難道是這個Distinct函數在比較時,先比較的HashCode值?

帶着這個疑問,反編譯了下Distinct的代碼,確實如我所猜測的那樣。下面是源代碼,有興趣的同學,可以往下看看:

public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source)
{
    if (source == null) throw Error.ArgumentNull("source");
    return DistinctIterator<TSource>(source, null);
}
 
 private static IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
    <DistinctIterator>d__81<TSource> d__ = new <DistinctIterator>d__81<TSource>(-2);
    d__.<>3__source = source;
    d__.<>3__comparer = comparer;
    return d__;
}
 
 private sealed class <DistinctIterator>d__81<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
    // Fields
    private int <>1__state;
    private TSource <>2__current;
    public IEqualityComparer<TSource> <>3__comparer;
    public IEnumerable<TSource> <>3__source;
    public IEnumerator<TSource> <>7__wrap84;
    private int <>l__initialThreadId;
    public TSource <element>5__83;
    public Set<TSource> <set>5__82;
    public IEqualityComparer<TSource> comparer;
    public IEnumerable<TSource> source;
 
    // Methods
    [DebuggerHidden]
    public <DistinctIterator>d__81(int <>1__state);
    private void <>m__Finally85();
    private bool MoveNext();
    [DebuggerHidden]
    IEnumerator<TSource> IEnumerable<TSource>.GetEnumerator();
    [DebuggerHidden, TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    IEnumerator IEnumerable.GetEnumerator();
    [DebuggerHidden]
    void IEnumerator.Reset();
    void IDisposable.Dispose();
 
    // Properties
    TSource IEnumerator<TSource>.Current { [DebuggerHidden] get; }
    object IEnumerator.Current { [DebuggerHidden] get; }
}
 
private sealed class <DistinctIterator>d__81<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
    // Fields
    private int <>1__state;
    private TSource <>2__current;
    public IEqualityComparer<TSource> <>3__comparer;
    public IEnumerable<TSource> <>3__source;
    public IEnumerator<TSource> <>7__wrap84;
    private int <>l__initialThreadId;
    public TSource <element>5__83;
    public Set<TSource> <set>5__82;
    public IEqualityComparer<TSource> comparer;
    public IEnumerable<TSource> source;
 
    // Methods
    [DebuggerHidden]
    public <DistinctIterator>d__81(int <>1__state);
    private void <>m__Finally85();
    private bool MoveNext();
    [DebuggerHidden]
    IEnumerator<TSource> IEnumerable<TSource>.GetEnumerator();
    [DebuggerHidden, TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    IEnumerator IEnumerable.GetEnumerator();
    [DebuggerHidden]
    void IEnumerator.Reset();
    void IDisposable.Dispose();
 
    // Properties
    TSource IEnumerator<TSource>.Current { [DebuggerHidden] get; }
    object IEnumerator.Current { [DebuggerHidden] get; }
}
 
private bool MoveNext()
{
    bool flag;
    try
    {
        switch (this.<>1__state)
        {
            case 0:
                this.<>1__state = -1;
                this.<set>5__82 = new Set<TSource>(this.comparer);
                this.<>7__wrap84 = this.source.GetEnumerator();
                this.<>1__state = 1;
                goto Label_0092;
 
            case 2:
                this.<>1__state = 1;
                goto Label_0092;
 
            default:
                goto Label_00A5;
        }
    Label_0050:
        this.<element>5__83 = this.<>7__wrap84.Current;
        if (this.<set>5__82.Add(this.<element>5__83))
        {
            this.<>2__current = this.<element>5__83;
            this.<>1__state = 2;
            return true;
        }
    Label_0092:
        if (this.<>7__wrap84.MoveNext()) goto Label_0050;
        this.<>m__Finally85();
    Label_00A5:
        flag = false;
    }
    fault
    {
        this.System.IDisposable.Dispose();
    }
    return flag;
}
 
internal class Set<TElement>
{
    // Fields
    private int[] buckets;
    private IEqualityComparer<TElement> comparer;
    private int count;
    private int freeList;
    private Slot<TElement>[] slots;
 
    // Methods
    [TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    public Set();
    public Set(IEqualityComparer<TElement> comparer);
    public bool Add(TElement value);
    [TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    public bool Contains(TElement value);
    private bool Find(TElement value, bool add);
    internal int InternalGetHashCode(TElement value);
    public bool Remove(TElement value);
    private void Resize();
 
    // Nested Types
    [StructLayout(LayoutKind.Sequential)]
    internal struct Slot
    {
        internal int hashCode;
        internal TElement value;
        internal int next;
    }
}
public bool Add(TElement value)
{
    return !this.Find(value, true);
}
  
public bool Contains(TElement value)
{
    return this.Find(value, false);
}
 
private bool Find(TElement value, bool add)
{
    int hashCode = this.InternalGetHashCode(value);
    for (int i = this.buckets[hashCode % this.buckets.Length] - 1; i >= 0; i = this.slots[i].next)
    {
        if (this.slots[i].hashCode == hashCode && this.comparer.Equals(this.slots[i].value, value)) return true;//就是這一句了
    }
    if (add)
    {
        int freeList;
        if (this.freeList >= 0)
        {
            freeList = this.freeList;
            this.freeList = this.slots[freeList].next;
        }
        else
        {
            if (this.count == this.slots.Length) this.Resize();
            freeList = this.count;
            this.count++;
        }
        int index = hashCode % this.buckets.Length;
        this.slots[freeList].hashCode = hashCode;
        this.slots[freeList].value = value;
        this.slots[freeList].next = this.buckets[index] - 1;
        this.buckets[index] = freeList + 1;
    }
    return false;
}


 在這段代碼中可以看出,擴展函數Distinct在內部使用了一個Set<T>的類來幫助踢掉重複數據,而這個內部類使用的是hash表的方式存儲數據,所以會調用到我們自定義類的GetHashCode函數,如果返回的hashcode值不等,它就不會再調用Equels方法進行比較了。

原因已經一目瞭然了,得出的結論就是:

1,重寫Equles方法的時候,儘量重寫GetHashCode函數,並且不要簡單的調用object的GetHashCode函數,返回一個設計合理的hash值,以保證結果如我們的預期。上面的做法直接返回了0,雖然解決了問題,但明顯不是每個對象的hash值都是0,做法欠妥。

2,List<T>的Contains,IndexOf方法,不會用到GetHashCode函數。

3,擴展函數Distinct,Except用到了GetHashCode函數,必須重寫這個函數。其他還有哪些函數用到了GetHashCode函數,以後再做補充,使用時多加註意就是了。

4,如果對象要作爲字典類(Dictionary)的主鍵,必須重寫GetHashCode函數。

主要參考:http://www.cnblogs.com/xiashengwang/archive/2013/03/04/2942555.html


發佈了55 篇原創文章 · 獲贊 142 · 訪問量 36萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章