數據結構之散列表

散列表（Hash table，也叫哈希表），是根據關鍵碼值(Key value)而直接進行訪問的數據結構。也就是說，它通過把關鍵碼值映射到表中一個位置來訪問記錄，以加快查找的速度。這個映射函數叫做散列函數，存放記錄的數組叫做散列表。

爲什麼需要散列表？

如果忽略內存，我們將鍵作爲數組的索引，那麼所有的查找查找操作只需要訪問一次內存即可。當鍵很多時，這需要太大的內存。

散列的主要目的是將鍵均勻分佈，因此散列後鍵就是無序的。

散列算法分爲兩步：

第一步用散列函數將鍵轉化爲數組的索引。這可能導致多個鍵都得到相同的索引。

第二部就是處理碰撞衝突。

散列的查找算法：

散列函數：如果我們有一個能保存M個鍵值對的數組，就需要一個可以將任意鍵轉化爲該數組範圍內的索引([0，M-1]範圍的整數)的散列函數。

散列函數和鍵的類型有關，嚴格說，對於每種類型的鍵我們都需要一個與之對應的散列函數。

優秀的散列方法需要滿足三個條件：

一致性（等價的鍵必然產生相等的散列值）
高效性（計算簡便）
均勻性（均勻地散列所有的鍵）

解決碰撞的方法：

拉鍊法
線性探測法

拉鍊法

基於拉鍊法的散列表：它是將大小爲M的數組中的每一個元素都指向一條鏈表，鏈表中的每個節點都存儲了散列值爲該元素的索引的鍵值對。

查找算法：先根據散列值找到對應的鏈表，然後沿鏈表順序查找相應的值。

我們使用M條鏈表來保存N個鍵，那麼鏈表的平均長度爲N/M。

hash：這裏使用默認的hashCode()方法

  private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % m;
  }

& 0x7fffffff (第一位是0，後面31個1)可以屏蔽符號位。

爲什麼需要屏蔽符號位，而不是之間key.hashCode % m ：因爲Java取餘的結果可能爲負數。(例如-4 % 3 = -1)

% m 保證索引值在[0，M-1]之間。

爲了防止鏈表過長，導致查找和插入成本過高，這裏我們把鏈表的平均長度（N/M）限制在2~10之間。

我們使用重用之前實現的鏈表(SequentialSearchST)。無序鏈表

實現代碼：


import java.util.LinkedList;
import java.util.Queue;

/**
 * @author yuan
 * @date 2019/2/28
 * @description 基於拉鍊法的散列表
 */
public class SeparateChainingHashST<Key, Value> {

    /**
     * 鍵值對總數
     */
    private int n;
    /**
     * 散列表大小
     */
    private int m;

    /**
     * 存放鏈表對象的數組
     */
    private SequentialSearchST<Key, Value>[] st;

    private static final int DEFAULT_CAPACITY = 4;


    public SeparateChainingHashST(){
        this(DEFAULT_CAPACITY);
    }

    public SeparateChainingHashST(int m) {
        this.m = m;
        st = new SequentialSearchST[m];
        for (int i = 0; i < m; i++) {
            st[i] = new SequentialSearchST<>();
        }
    }

    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % m;
    }

    public int size(){
        return n;
    }

    public boolean isEmpty(){
        return size() == 0;
    }


    public Value get(Key key) {
        if (key == null) {
            throw new IllegalArgumentException("argument to get() is null");
        }
        return st[hash(key)].get(key);
    }

    public void put(Key key, Value value) {
        if (key == null) {
            throw new IllegalArgumentException("first argument to put() is null");
        }
        if (value == null) {
            delete(key);
            return;
        }
        // 如果 n / m (鏈表的平均長度) >= 10
        if (n >= 10 * m) {
            resize(2 * m);
        }

        int i = hash(key);
        if (!contains(key)) {
            ++n;
        }
        st[i].put(key, value);
    }

    private boolean contains(Key key) {
        if (key == null) {
            throw new IllegalArgumentException("argument to contains() is null");
        }
        return get(key) != null;
    }

    private void resize(int chains) {
        SeparateChainingHashST<Key, Value> temp = new SeparateChainingHashST<>(chains);
        for (int i = 0; i < m; i++) {
            for (Key key : st[i].keys()) {
                temp.put(key, st[i].get(key));
            }
        }
        this.m = temp.m;
        this.n = temp.n;
        this.st = temp.st;
    }

    private void delete(Key key) {
        if (key == null) {
            throw new IllegalArgumentException("argument to delete() is null");
        }
        int i = hash(key);
        if (contains(key)) {
            --n;
        }
        st[i].delete(key);

        // 如果 鏈表平均長度(n / m) <= 2
        if (m > DEFAULT_CAPACITY && n <= 2 * m) {
            resize(m / 2);
        }
    }

    public Iterable<Key> keys(){
        Queue<Key> queue = new LinkedList<>();
        for (int i = 0; i < m; i++) {
            for (Key key : st[i].keys()) {
                queue.offer(key);
            }
        }
        return queue;
    }

    public static void main(String[] args) {
        SeparateChainingHashST<String, Integer> st = new SeparateChainingHashST<>();

        st.put("ccc", 2);
        st.put("bbb", 3);
        st.put("aaa", 1);
        st.put("ddd", 1);


        for (String s : st.keys()) {
            System.out.println(s + " " + st.get(s));
        }
        System.out.print("keys = ");
        st.keys().forEach(s -> System.out.print(s + ","));
        System.out.println();

        System.out.println(st.contains("aaa")); // true
        System.out.println(st.contains("cda")); // false
    }


}

線性探測法

線性探測法就是用大小爲M的數組保存N個鍵值對，其中M>N。依靠數組中的空位解決碰撞衝突。

基本思想：當鍵的散列值發生衝突時，直接檢查散列表中的下一個位置(將索引值加1)，檢查其中的鍵和被查找的鍵是否相同，如果不同則繼續查找(索引增大)，直到找到該鍵或遇到一個空元素。

刪除操作

注意，不能直接將該鍵設爲null，因爲這樣會導致後面的鍵無法被查找。

我們需要將被刪除的鍵的右側的所有鍵重新插入散列表。

鍵簇(cù)

就是元素插入數組後形成的一條連續的條目。

調整數組大小

當數組使用率(N/M)小於1/2時，查找次數只在1.5到2.5之間。(具體參考算法第4版)

這裏我們使散列表的使用率不超過1/2。

代碼：


import java.util.LinkedList;
import java.util.Queue;

/**
 * @author yuan
 * @date 2019/2/28
 * @description 基於線性探測法的散列表
 */
public class LinearProbingHashST<Key , Value> {
    private static final int DEFAULT_CAPACITY = 4;

    /**
     * 符號表中鍵值對總數
     */
    private int n;
    /**
     * 線性探測表大小
     */
    private int m;
    /**
     * 鍵
     */
    private Key[] keys;
    /**
     * 值
     */
    private Value[] vals;


    public LinearProbingHashST(){
        this(DEFAULT_CAPACITY);
    }

    public LinearProbingHashST(int capacity) {
        m = capacity;
        n = 0;
        keys = (Key[]) new Object[m];
        vals = (Value[]) new Object[m];
    }

    public int size(){
        return n;
    }


    public boolean isEmpty(){
        return size() == 0;
    }

    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % m;
    }

    private void resize(int capacity) {
        LinearProbingHashST<Key, Value> temp = new LinearProbingHashST<>(capacity);
        for (int i = 0; i < m; i++) {
            if (keys[i] != null) {
                temp.put(keys[i], vals[i]);
            }
        }
        keys = temp.keys;
        vals = temp.vals;
        m = temp.m;
    }

    /**
     * 插入
     * @param key
     * @param val
     */
    public void put(Key key, Value val) {
        if (key == null) {
            throw new IllegalArgumentException("first argument to put() is null");
        }

        if (val == null) {
            // 值爲null，則刪除對應的鍵
            delete(key);
            return;
        }
        // 如果使用率大於1/2，擴大數組
        if (n >= m / 2) {
            resize(2 * m);
        }
        int i;
        for (i = hash(key); keys[i] != null; i = (i + 1) % m) {
            if (keys[i].equals(key)) {
                // 如果鍵存在，更新
                vals[i] = val;
                return;
            }
        }
        // 鍵不存在
        keys[i] = key;
        vals[i] = val;
        ++n;
    }

    /**
     * 刪除，需要將被刪除鍵的右側的所有鍵重新插入散列表
     * @param key
     */
    public void delete(Key key) {
        if (key == null) {
            throw new IllegalArgumentException("argument to delete() is null");
        }
        if (!contains(key)) {
            return;
        }
        int i = hash(key);
        while (!key.equals(keys[i])) {
            i = (i + 1) % m;
        }
        // 找到要刪除的鍵
        keys[i] = null;
        vals[i] = null;
        i = (i + 1) % m;
        // 將被刪除的鍵的右側重新插入
        while (keys[i] != null) {
            Key keyToRehash = keys[i];
            Value valToRehash = vals[i];
            keys[i] = null;
            vals[i] = null;
            --n;
            put(keyToRehash, valToRehash);
            i = (i + 1) % m;
        }
        // 減去被刪除的鍵
        --n;
        // 如果使用率爲1/8，縮小數組
        if (n > 0 && n == m / 8) {
            resize(m / 2);
        }
    }

    /**
     * 獲取
     * @param key
     * @return
     */
    public Value get(Key key) {
        if (key == null) {
            throw new IllegalArgumentException("argument to get() is null");
        }
        for (int i = hash(key); keys[i] != null; i = (i + 1) % m) {
            if (keys[i].equals(key)) {
                return vals[i];
            }
        }
        return null;
    }

    public boolean contains(Key key) {
        if (key == null) {
            throw new IllegalArgumentException("argument to contains() is null");
        }
        return get(key) != null;
    }

    public Iterable<Key> keys(){
        Queue<Key> queue = new LinkedList<>();
        for (int i = 0; i < m; i++) {
            if (keys[i] != null) {
                queue.offer(keys[i]);
            }
        }
        return queue;
    }

    public static void main(String[] args) {

        LinearProbingHashST<String, Integer> st = new LinearProbingHashST<>();
        st.put("ccc", 2);
        st.put("bbb", 3);
        st.put("aaa", 1);
        st.put("ddd", 1);


        for (String s : st.keys()) {
            System.out.println(s + " " + st.get(s));
        }
        System.out.print("keys = ");
        st.keys().forEach(s -> System.out.print(s + ","));
        System.out.println();

        System.out.println(st.contains("aaa")); // true
        System.out.println(st.contains("cda")); // false
    }

}

、

Java的TreeMap就是基於紅黑樹實現的。

Java的HashMap是基於拉鍊法的散列表實現的。

數據結構之散列表

散列的查找算法：

解決碰撞的方法：

拉鍊法

線性探測法

刪除操作

鍵簇(cù)

調整數組大小

後端緩存

leetcode 實現 Trie (前綴樹) (單詞查找樹)

數據結構之字典樹 Trie 單詞查找樹

leetcode 添加與搜索單詞 - 數據結構設計

leetcode 單詞搜索 dfs

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結