題目描述

不使用任何內建的哈希庫設計一個哈希集合

具體地說，你的設計應該包含以下的功能
add(value)：向哈希集合中插入一個值。
contains(value) ：返回哈希集合中是否存在這個值。
remove(value)：將給定值從哈希集合中刪除。如果哈希集合中沒有這個值，什麼也不做。

class MyHashSet:
    def __init__(self):
        """
        Initialize your data structure here.
        """

    def add(self, key: int) -> None:

    def remove(self, key: int) -> None:
        if key in self.s:self.s.remove(key)

    def contains(self, key: int) -> bool:
        """
        Returns true if this set contains the specified element
        """

算法思路

最簡單的方法當然是內置數據結構集合set了，但這沒啥意思。
且提一句，關於remove的實現，set的方法remove和pop在元素不存在時都會報錯，所以這裏需要添加一個存在判斷。

手寫哈希集合的任務就留到以後吧。
2020/06/05 回來打卡了還是直接指路吧：手寫哈希集合

爲了實現 HashSet 數據結構，有兩個關鍵的問題，即哈希函數和衝突處理。

哈希函數：目的是分配一個地址存儲值。理想情況下，每個值都應該有一個對應唯一的散列值。
衝突處理：哈希函數的本質就是從 A 映射到 B。但是多個 A 值可能映射到相同的 B。這就是碰撞。因此，我們需要有對應的策略來解決碰撞。總的來說，有以下幾種策略解決衝突：
- 單獨鏈接法：對於相同的散列值，我們將它們放到一個桶中，每個桶是相互獨立的。
- 開放地址法：每當有碰撞，則根據我們探查的策略找到一個空的槽爲止。
- 雙散列法：使用兩個哈希函數計算散列值，選擇碰撞更少的地址。

單獨鏈接法:鏈表

哈希函數的共同特點是使用模運算符。 $\text{hash} = \text{value} \mod \text{base}$ 。其中， $\text{base}$ 將決定 HashSet 中的桶數。
從理論上講，桶越多（因此空間會越大）越不太可能發生碰撞。 $\text{base}$ 的選擇是空間和碰撞之間的權衡。

此外，使用質數作爲 $\text{base}$ 是一個明智的選擇。例如 769769，可以減少潛在的碰撞。
對於桶的設計，使用數組來存儲桶的所有值。然而數組的一個缺點是需要 $\mathcal{O}(N)$ 的時間複雜度進行插入和刪除，而不是 $\mathcal{O}(1)$ 。

因爲任何的更新操作，我們首先是需要掃描整個桶爲了避免重複。所以使用鏈表更好，這樣在插入和刪除上可以做到 $\mathcal{O}(1)$ 。

class MyHashSet(object):

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.keyRange = 769 #確定桶的數量
        self.bucketArray = [Bucket() for i in range(self.keyRange)]

    def _hash(self, key):
        return key % self.keyRange

    def add(self, key):
        """
        :type key: int
        :rtype: None
        """
        bucketIndex = self._hash(key)
        self.bucketArray[bucketIndex].insert(key)

    def remove(self, key):
        """
        :type key: int
        :rtype: None
        """
        bucketIndex = self._hash(key)
        self.bucketArray[bucketIndex].delete(key)

    def contains(self, key):
        """
        Returns true if this set contains the specified element
        :type key: int
        :rtype: bool
        """
        bucketIndex = self._hash(key)
        return self.bucketArray[bucketIndex].exists(key)


class Node:# 鏈表作爲容器
    def __init__(self, value, nextNode=None):
        self.value = value
        self.next = nextNode

class Bucket:# 桶的定義
    def __init__(self):
        # a pseudo head
        self.head = Node(0)

    def insert(self, newValue):
        # if not existed, add the new element to the head.
        if not self.exists(newValue):
            newNode = Node(newValue, self.head.next)
            # set the new head.
            self.head.next = newNode

    def delete(self, value):
        prev = self.head
        curr = self.head.next
        while curr is not None:
            if curr.value == value:
                # remove the current node
                prev.next = curr.next
                return
            prev = curr
            curr = curr.next

    def exists(self, value):
        curr = self.head.next
        while curr is not None:
            if curr.value == value:
                # value existed already, do nothing
                return True
            curr = curr.next
        return False

複雜度分析
時間複雜度： $\mathcal{O}(\frac{N}{K})$ 。其中 N 指的是所有可能值數量，K 指的是預定義的桶數，也就是 769。假設值是平均分佈的，因此可以考慮桶的平均大小是 $\frac{N}{K}$ 。
對於每個操作，在最壞的情況下，我們需要掃描整個桶，因此時間複雜度是 $\mathcal{O}(\frac{N}{K})$ 。
空間複雜度： $\mathcal{O}(K+M)$ ，其中 K 指的是預定義的桶數，M 指的是已經插入到 HashSet 中值的數量。

單獨鏈接法:二叉搜索樹

在上述的方法中，有一個缺點，我們需要掃描整個桶才能驗證一個值是否已經在桶中（即查找操作）。

我們可以將桶作爲一個排序列表，可以使用二分搜索使查找操作的時間複雜度是 $\mathcal{O}(\log{N})$ ，優於上面方法中的 $\mathcal{O}({N})$ 。

另一方面，如果使用排序列表等連續空間的數組來實現，則會產生線性時間複雜度的更新操作，因此需要其他的方式。

有數據結構具有 $\mathcal{O}(\log{N})$ 時間複雜度的查找，刪除，插入操作嗎？

當然有，就是二叉搜索樹。二叉搜索樹的特性使得我們能夠優化時間複雜度。
實際上，我們將二叉搜索樹的每個操作作爲 LeetCode 的獨立問題，如下：

【力扣】700 二叉搜索樹中的搜索 | 遞歸
 【力扣】701：二叉搜索樹中的插入操作 | 二叉搜索樹 BST
【力扣】450：刪除二叉搜索樹中的節點 | BST 經典

class MyHashSet:

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.keyRange = 769
        self.bucketArray = [Bucket() for i in range(self.keyRange)]

    def _hash(self, key) -> int:
        return key % self.keyRange

    def add(self, key: int) -> None:
        bucketIndex = self._hash(key)
        self.bucketArray[bucketIndex].insert(key)

    def remove(self, key: int) -> None:
        """
        :type key: int
        :rtype: None
        """
        bucketIndex = self._hash(key)
        self.bucketArray[bucketIndex].delete(key)

    def contains(self, key: int) -> bool:
        """
        Returns true if this set contains the specified element
        :type key: int
        :rtype: bool
        """
        bucketIndex = self._hash(key)
        return self.bucketArray[bucketIndex].exists(key)

class Bucket:
    def __init__(self):
        self.tree = BSTree()

    def insert(self, value):
        self.tree.root = self.tree.insertIntoBST(self.tree.root, value)

    def delete(self, value):
        self.tree.root = self.tree.deleteNode(self.tree.root, value)

    def exists(self, value):
        return (self.tree.searchBST(self.tree.root, value) is not None)

class TreeNode:
    def __init__(self, value):
        self.val = value
        self.left = None
        self.right = None

class BSTree:
    def __init__(self):
        self.root = None

    def searchBST(self, root: TreeNode, val: int) -> TreeNode:
        if root is None or val == root.val:
            return root

        return self.searchBST(root.left, val) if val < root.val \
            else self.searchBST(root.right, val)

    def insertIntoBST(self, root: TreeNode, val: int) -> TreeNode:
        if not root:
            return TreeNode(val)

        if val > root.val:
            # insert into the right subtree
            root.right = self.insertIntoBST(root.right, val)
        elif val == root.val:
            return root
        else:
            # insert into the left subtree
            root.left = self.insertIntoBST(root.left, val)
        return root

    def successor(self, root):
        """
        One step right and then always left
        """
        root = root.right
        while root.left:
            root = root.left
        return root.val

    def predecessor(self, root):
        """
        One step left and then always right
        """
        root = root.left
        while root.right:
            root = root.right
        return root.val

    def deleteNode(self, root: TreeNode, key: int) -> TreeNode:
        if not root:
            return None

        # delete from the right subtree
        if key > root.val:
            root.right = self.deleteNode(root.right, key)
        # delete from the left subtree
        elif key < root.val:
            root.left = self.deleteNode(root.left, key)
        # delete the current node
        else:
            # the node is a leaf
            if not (root.left or root.right):
                root = None
            # the node is not a leaf and has a right child
            elif root.right:
                root.val = self.successor(root)
                root.right = self.deleteNode(root.right, root.val)
            # the node is not a leaf, has no right child, and has a left child
            else:
                root.val = self.predecessor(root)
                root.left = self.deleteNode(root.left, root.val)

        return root

複雜度分析

時間複雜度： $\mathcal{O}(\frac{N}{K})$ 。其中 N 指的是所有可能值數量，K 指的是預定義的桶數，也就是 769。
假設值是平均分佈的，因此可以考慮桶的平均大小是 $\frac{N}{K}$ 。
當我們遍歷二叉搜索樹時，使用二分查找，最後每個操作的時間複雜度是 $\mathcal{O}(\log{\frac{N}{K}})$ 。
空間複雜度： $\mathcal{O}(K+M)$ ，其中 K 指的是預定義的桶數，M 指的是已經插入到 HashSet 中值的數量。

【力扣日記】705 設計哈希集合| 數據結構收藏

題目描述

算法思路

單獨鏈接法:鏈表

單獨鏈接法:二叉搜索樹

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

【力扣】32：最長有效括號 | hard

【力扣】139：單詞拆分 | 動態規劃 | BFS | DFS

【力扣】1014：最佳觀光組合

【python】一行語句可以做到什麼！

【力扣】面試16.18：模式匹配

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【力扣日記】705 設計哈希集合| 數據結構 收藏

題目描述

算法思路

單獨鏈接法:鏈表

單獨鏈接法:二叉搜索樹

【力扣日記】705 設計哈希集合| 數據結構收藏