布隆過濾器介紹以及實現

1.什麼是布隆過濾器（Bloom-Filter）

介紹：Bloom-Filter一般用於在大數據量的集合中判定某元素是否存在。

常用應用場景：當併發量很大並且無效請求較多時，很容易造成緩存擊穿，使用布隆過濾器來過濾無效請求。

2.布隆過濾器實現原理。

第一步將庫中已存在的數據取出；

第二步通過多個相互獨立的hash函數將取出的數據進行計算；

第三步將結果映射到bit數組中，第四步輸入要過濾的值；

第五步重複第二步中的hash計算；

第六步將結果和bit數組中的值進行比對，假如bit數組中爲1的位置hash計算結果爲0，則說明數據不存在。

例子：如果庫中的值爲“123”，hash結果映射到bit數組中的結果是全1。當輸入值爲“123”，計算hash映射後發現，對應的bit數組中存在0，則說明數據不存在。

優缺點：優點是空間效率和查詢時間都遠超過一般的算法，缺點是有一定的誤識別率和刪除困難。布隆過濾器說沒有的數據肯定沒有，但是返回存在的數據卻不一定存在。

3. 布隆過濾器支持對數據的刪除嗎？

傳統的布隆過濾器不支持刪除操作。然而其衍生過濾器“Counting Bloom filter”支持元素刪除。

4. 如何選擇哈希函數個數和布隆過濾器長度

布隆過濾器長度不能過小，當較小時bit數組中的所有位很容易全爲1，所有的輸入值都會返回數據存在。所以長度不能過小，但過大也會影響查詢效率。

哈希函數的個數也需要權衡，個數越多則布隆過濾器 bit 位置位 1 的速度越快，且布隆過濾器的效率越低；但是如果太少的話，那誤報率又會變高。

如何選擇適合業務的 k 和 m 值呢，常用公式如下：

k 爲哈希函數個數，m 爲布隆過濾器長度，n 爲插入的元素個數，p 爲誤報率。

6.布隆過濾器的大Value問題

生成環境中建議對體積龐大的布隆過濾器進行拆分。拆分的形式方法多種多樣，但是本質是不要將 Hash(Key) 之後的請求分散在多個節點的多個小 bitmap 上，而是應該拆分成多個小 bitmap 之後，對一個 Key 的所有哈希函數都落在這一個小 bitmap 上。

7.布隆過濾器Java實現

import java.util.BitSet;
import java.util.concurrent.atomic.AtomicInteger;

public class BloomFilterPractice {

    private final BitSet bitSet;
    private final AtomicInteger useCount = new AtomicInteger();

    public BloomFilterPractice() {
        this( 16);
    }

    /**
     * @param size 過濾器長度
     */
    public BloomFilterPractice(int size) {
        //每個字符串需要的bit位數*總數據量
        long bitSize = size;
        if (bitSize < 0 || bitSize > Integer.MAX_VALUE) {
            throw new RuntimeException("長度溢出");
        }
        //創建一個BitSet位集合
        bitSet = new BitSet(size);
    }

    private Double getUseRate() {
        return (double) useCount.intValue() / (double) bitSet.length();
    }
    
    private int getHashValue(String data, int hashNmber){
        int hashCode = data.hashCode();
        int hash = hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4);
        hash = hash * hashNmber % bitSet.length();
        return Math.abs(hash);
    }

    public Boolean inputValue(String value){
        int[] indexs = new int[16];
        int[] hashNumber = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
        //假定data已經存在
        boolean exist = true;
        int index;
        for (int i = 0; i < indexs.length; i++) {
            //計算位hash值
            indexs[i] = index = getHashValue(value, hashNumber[i]);
            if (exist) {
                //如果某一位bit不存在，則說明該data不存在
                if (!bitSet.get(index)) {
                    exist = false;
                    setTrue(index);
                }
            } else {
                //如果不存在則直接置爲true
                setTrue(index);
            }
        }
        return exist;
    }

    private void setTrue(int index) {
        useCount.incrementAndGet();
        bitSet.set(index, true);
    }

    public static void main(String[] args) {
        BloomFilterPractice bloomFilterPractice = new BloomFilterPractice();
        System.out.println(bloomFilterPractice.inputValue("中國聯通"));
        System.out.println(bloomFilterPractice.inputValue("中國電信"));
        System.out.println(bloomFilterPractice.inputValue("中國移動"));
        System.out.println(bloomFilterPractice.inputValue("誰最厲害"));
        System.out.println(bloomFilterPractice.inputValue("誰最厲害"));
        System.out.println(bloomFilterPractice.getUseRate());
    }

}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

布隆過濾器介紹以及實現

.NET開源強大、易於使用的緩存框架 - FusionCache

面試，有時候是個運氣活

Mysql數據庫技術Tip

布隆過濾器介紹以及實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結