HashMap學習筆記

原理

hash表是一種數據結構，它使用hash函數組織數據，以支持快速插入和搜索。

其關鍵思想是使用hash函數將鍵映射到存儲桶。

當我們插入一個新的鍵是，hash函數將決定鍵分配到哪一個桶中，並將該鍵存儲仔相應的桶中。
當我們搜索一個鍵時，hash表使用相同的hash函數來查找對應的桶，並只在特定的桶中進行搜索。

設計hash表的關鍵

hash函數

hash函數是hash表中最重要的組件，該hash表的用於將鍵映射到特定的桶。簡單舉例，我們使用 y= x % 5 作爲散列函數，其中x是鍵值，y是分配的桶的索引。
散列函數將取決與鍵值的範圍和桶的數量。

如何設計hash函數是一個開放的問題，思想時儘可能地將鍵分配到桶中，理想情況瞎，完美的hash函數是鍵和桶之間是一對一映射，然而大多數情況瞎，hash函數並不完美，需要在桶地數量和桶的容量之間進行權衡。

衝突解決

衝突解決算法應該解決以下幾個問題：

如何組織在一個桶中的值？
如果同一個桶中分配了太多的值，怎麼辦？
如何在特定的桶中搜索目標值？

這些問題與桶的容量和可能映射到同一個桶的鍵的數目有關。

假設存儲最大鍵數的桶有N個鍵，如果N是常數且很小，我們可以簡單地使用一個數組將鍵存在同一個桶中。如果N是可變的或者很大，我們可能需要使用高度平衡的二叉樹來代替。

訓練

插入和搜索是hash表中的兩個基本操作，此外還有基於這兩個操作的操作，當我們刪除元素時，要先搜索元素，然後在元素存在的情況下從相應位置移除元素。

設計Hash集合

這裏使用LinkedList數組來實現HashSet，並記錄一個size屬性。index是key%size後的索引，在單個LinkedList中，將key作爲值存入，實現多個鍵存在一個桶裏。相同的key當然是相同的值，不同的key在index一樣的時候可以存進同一個桶，並且根據key區分，以實現一個桶多個鍵的效果。

class MyHashSet {
    private LinkedList[] lists;
    private final int size = 10000;

    /**
     * Initialize your data structure here.
     */
    public MyHashSet() {
        lists = new LinkedList[size];
    }

    public void add(int key) {
        int index = key % size;
        if (lists[index] == null) {
            lists[index] = new LinkedList();
        }
        if (!contains(key)) {
            lists[index].addFirst(key);
        }
    }

    public void remove(int key) {
        int index = key % size;
        if (lists[index] != null) {
            lists[index].remove((Integer) key);
        }
    }

    /**
     * Returns true if this set contains the specified element
     */
    public boolean contains(int key) {
        int index = key % size;
        return lists[index] != null && lists[index].contains(key);
    }
}

/**
 * Your MyHashSet object will be instantiated and called as such:
 * MyHashSet obj = new MyHashSet();
 * obj.add(key);
 * obj.remove(key);
 * boolean param_3 = obj.contains(key);
 */

設計HashMap

記錄了Node數組、容量、當前大小以及負載因子。當size>=capacity * THERESHOD時擴容爲原來的兩倍。
爲了方便理解代碼，這裏hash函數只是簡單返回了自身，要了解更多可以查看HashMap源碼的Hash方法。
這裏的桶都是爲了存儲鍵，值是和鍵是一一對應的，只要考慮鍵和桶的關係就行。

class MyHashMap {
    Node[] arr;
    int capacity;
    int size;
    private static final double THERESHOD = 0.75;

    /**
     * Initialize your data structure here.
     */
    public MyHashMap() {
        capacity = 200000;
        arr = new Node[capacity];
        size = 0;
    }

    /**
     * value will always be non-negative.
     */
    public void put(int key, int value) {
        put(arr, key, value);
    }

    private void put(Node[] arr, int key, int value) {
        if (size > capacity * THERESHOD) {
            // 二倍擴容
            growCapacity();
        }
        int idx = hash(key) % capacity;
        // 使用二次hash 解決碰撞
        while (arr[idx] != null && arr[idx].key != key) {
            if (arr[idx].value == -1) {
                // 說明這個元素已經被remove了
                break;
            }
            idx = hash(idx) % capacity;
        }
        arr[idx] = new Node(key, value);
        size++;
    }

    private void growCapacity() {
        // 倍增後reHash放入即可
        capacity *= 2;
        Node[] newArr = new Node[capacity];
        reHash(newArr, arr);
        arr = newArr;
    }

    private void reHash(Node[] newArr, Node[] arr) {
        for (Node node : arr) {
            // 被刪掉的應該被清除
            if (node != null && node.value != -1) {
                put(newArr, node.key, node.value);
            }
        }
    }

    /**
     * Returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key
     */
    public int get(int key) {
        int idx = getIdxByKey(key);
        return idx == -1 ? -1 : arr[idx].value;
    }

    private int getIdxByKey(int key) {
        int idx = hash(key) % capacity;
        while (arr[idx] != null && arr[idx].key != key) {
            idx = hash(idx) % capacity;
        }
        if (arr[idx] == null || arr[idx].value == -1) {
            return -1;
        }
        return idx;
    }

    private int hash(int key) {
        return Integer.hashCode(key);
    }

    /**
     * Removes the mapping of the specified value key if this map contains a mapping for the key
     */
    public void remove(int key) {
        int idx = getIdxByKey(key);
        if (idx != -1) {
            arr[idx].value = -1;
            size--;
        }
    }
}

class Node {
    int key;
    int value;

    public Node(int key, int value) {
        this.key = key;
        this.value = value;
    }
}

複雜度分析-hash表

如果有M個鍵，那麼在使用Hash表時，很同意就達到O(M)的空間複雜度。
但是，Hash表的時間複雜度和設計有很強的聯繫。我沒可能使用數組來將值存在同一個桶中，理想情況下，桶的大小足夠小時，可以看作是一個常數。插入和搜索的時間複雜度都是O(1)。
但在最壞的情況瞎，桶大小的最大值將爲N。插入時間複雜度爲O(1)，搜索時爲O(N)。

內置hash表的原理
內置hash表的典型設計是：

鍵值可以是任何 可hash化 的類型。並且屬於可hash類型的值將具有hash碼。此hash碼將用於映射函數以獲取存儲區索引。
每個桶包含一個數組，用於在初始時將所有值存儲在同一個桶中。
如果在同一個桶中有太多的值，這些值將被保留在一個高度平衡的二叉搜索樹中。

插入和搜索的平均時間複雜度仍爲O(1)。最壞情況下的插入和搜索的時間複雜度是O(logN)，使用高度平衡的BST。這是在插入和搜索之間的一種平衡。

實際使用

使用hash集合查重

簡單地迭代每個值並將值插入集合中。如果值已經在哈希集中，則存在重複。

boolean findDuplicates(List<Type>& keys) {
    // Replace Type with actual type of your key
    Set<Type> hashset = new HashSet<>();
    for (Type key : keys) {
        if (hashset.contains(key)) {
            return true;
        }
        hashset.insert(key);
    }
    return false;
}

HashMap查詢出現次數

目標元素作爲鍵，出現次數作爲值，每遍歷一次更新值

提供更多信息

在這個例子中，如果我們只想在有解決方案時返回 true，我們可以使用哈希集合來存儲迭代數組時的所有值，並檢查 target - current_value 是否在哈希集合中。但是，我們被要求返回更多信息，這意味着我們不僅關心值，還關心索引。我們不僅需要存儲數字作爲鍵，還需要存儲索引作爲值。因此，我們應該使用哈希映射而不是哈希集合。

ReturnType aggregateByKey_hashmap(List<Type>& keys) {
    // Replace Type and InfoType with actual type of your key and value
    Map<Type, InfoType> hashmap = new HashMap<>();
    for (Type key : keys) {
        if (hashmap.containsKey(key)) {
            if (hashmap.get(key) satisfies the requirement) {
                return needed_information;
            }
        }
        // Value can be any information you needed (e.g. index)
        hashmap.put(key, value);    
    }
    return needed_information;
}

按鍵聚合

示例：給定一個字符串，找到它重的第一個非重複字符並返回它的索引。如果它不存在，則返回-1

解決此問題的一種簡單方法是首先計算每個字符的出現次數。然後通過結果找出第一個與衆不同的角色。因此，我們可以維護一個哈希映射，其鍵是字符，而值是相應字符的計數器。每次迭代一個字符時，我們只需將相應的值加 1。

解決此類問題的關鍵是在遇到現有鍵時確定策略。在上面的示例中，我們的策略是計算事件的數量。有時，我們可能會將所有值加起來。有時，我們可能會用最新的值替換原始值。策略取決於問題，實踐將幫助您做出正確的決定。

ReturnType aggregateByKey_hashmap(List<Type>& keys) {
    // Replace Type and InfoType with actual type of your key and value
    Map<Type, InfoType> hashmap = new HashMap<>();
    for (Type key : keys) {
        if (hashmap.containsKey(key)) {
            hashmap.put(key, updated_information);
        }
        // Value can be any information you needed (e.g. index)
        hashmap.put(key, value);    
    }
    return needed_information;
}

設計鍵

當字符串 / 數組中每個元素的順序不重要時，可以使用排序後的字符串 / 數組作爲鍵。
如果只關心每個值得偏移量，通常事第一個值得偏移量，則可以使用偏移量作爲鍵。
在樹中，有時會希望使用TreeNode作爲鍵，但在大多數情況下，採用子樹得序列化（值+路徑的遞歸路徑）表述可能會更好。
在矩陣中，可以使用行索引或者列索引作爲鍵。
在數獨中，可以講行索引和列索引組合來標識此元素屬於哪個塊。
有時在矩陣中，希望將值聚合在同一對角線中

致謝 —— leecode

HashMap學習筆記

原理

設計hash表的關鍵

hash函數

衝突解決

訓練

設計Hash集合

設計HashMap

複雜度分析-hash表

實際使用

使用hash集合查重

HashMap查詢出現次數

提供更多信息

按鍵聚合

設計鍵

Python 潮流週刊#52：Python 處理 Excel 的資源

Android學習--位置信息經緯度獲取+動態獲取權限

Android學習—實體類實現parcelable序列化

java—int和integer的區別

SpringBoot項目的創建、打包、部署

Android—頂部輪播圖

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結