布隆過濾器-BloomFilter

原創

库伯

2020-07-02 10:25

一、概述

簡單講布隆過濾器就是判斷一個列表中是否存在某個元素。一般在JAVA判斷是否存在，我們可以Map，Set等容器。但是當數據量特別大的時候，用Map和Set會佔用過多的內存。這個時候就會考慮用布隆過濾器了。

二、詳解

要創建一個布隆過濾器首選需要在內存中聲明一個Bit數組，假設數組的長度爲L，初始值全部爲0。

當put一個key到布隆過濾器的時候，會對key進行N次hash，然後對hash值 % L 取模，得到N個位置下標。然後將Bit數組中對應位置的值全部設置爲1。其中的L和N取決於key的預估總數和錯誤率，因爲bloomfilter不能保證100%的準確，這個後面會說。

當判斷一個key是否存在的時候也是對key進行N次hash取模，如果所有bit數組中所有位置的值都爲1，則認爲這個key有可能存在，注意這裏說是有可能。

過程如下圖：(這裏假設L爲10， N爲3)

假設：

"zhangsan"：3次hash取模的結果爲：0,2,4。

"lisi"：3次hash取模的結果爲：4,6,8。

"wangwu"：3次hash取模的結果爲：2,4,6。

如果已經存在"zhangsan"和"lisi"這倆個key，那麼即使"wangwu"這個key實際不存在，但是算法返回的結果是存在，因爲2，4，6這個三個位置已經被“zhangsan”和“lisi”佔用了。

綜上所述發現bloomfiter有一些特點：

如果算法返回不存在，那刻個key肯定不存在。
如果算法返回存在，那隻能說明有可能存在。
bloomfiter中的key無法刪除。因爲bit位是複用的，刪除會影響別的key。

那麼怎麼提升算法的準確度呢？

增加hash的次數（CPU和準確度的取捨）
增加bit數組的長度（內存和準確度的取捨）

三、實現

自己寫java代碼實現

package com.ikuboo.bloomfilter;

import java.util.BitSet;

/**
 * 布隆過濾器
 */
public class MyBloomFilter {

    private int length;

    /**
     * bitset
     */
    private BitSet bitSet;

    public MyBloomFilter(int length) {
        this.length = length;
        this.bitSet = new BitSet(length);
    }

    /**
     * 寫入數據
     */
    public void put(String key) {
        int first = hashcode_1(key);
        int second = hashcode_2(key);
        int third = hashcode_3(key);

        bitSet.set(first % length);
        bitSet.set(second % length);
        bitSet.set(third % length);
    }

    /**
     * 判斷數據是否存在
     *
     * @param key
     * @return true:存在，false：不存在
     */
    public boolean exist(String key) {
        int first = hashcode_1(key);
        int second = hashcode_2(key);
        int third = hashcode_3(key);

        boolean firstIndex = bitSet.get(first % length);
        if (!firstIndex) {
            return false;
        }
        boolean secondIndex = bitSet.get(second % length);
        if (!secondIndex) {
            return false;
        }
        boolean thirdIndex = bitSet.get(third % length);
        if (!thirdIndex) {
            return false;
        }
        return true;
    }

    /**
     * hash 算法1
     */
    private int hashcode_1(String key) {
        int hash = 0;
        int i;
        for (i = 0; i < key.length(); ++i) {
            hash = 33 * hash + key.charAt(i);
        }
        return Math.abs(hash);
    }

    /**
     * hash 算法2
     */
    private int hashcode_2(String data) {
        final int p = 16777619;
        int hash = (int) 2166136261L;
        for (int i = 0; i < data.length(); i++) {
            hash = (hash ^ data.charAt(i)) * p;
        }
        hash += hash << 13;
        hash ^= hash >> 7;
        hash += hash << 3;
        hash ^= hash >> 17;
        hash += hash << 5;
        return Math.abs(hash);
    }

    /**
     * hash 算法3
     */
    private int hashcode_3(String key) {
        int hash, i;
        for (hash = 0, i = 0; i < key.length(); ++i) {
            hash += key.charAt(i);
            hash += (hash << 10);
            hash ^= (hash >> 6);
        }
        hash += (hash << 3);
        hash ^= (hash >> 11);
        hash += (hash << 15);
        return Math.abs(hash);
    }

}

測試代碼

public class TestMyBloomFilter {
    public static void main(String[] args) {
        int capacity = 10000000;
        MyBloomFilter bloomFilters = new MyBloomFilter(capacity);
        bloomFilters.put("key1");

        System.out.println("key1是否存在:" + bloomFilters.exist("key1"));
        System.out.println("key2是否存在:" + bloomFilters.exist("key2"));
    }
}

guava類庫實現

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;

import java.nio.charset.Charset;

public class TestGuavaBloomFilter {
    public static void main(String[] args) {
        //預估的容量
        int capacity = 10000000;
        //期望的錯誤率
        double fpp = 0.01;

        BloomFilter<String> bloomFilters = BloomFilter.create(
                Funnels.stringFunnel(Charset.forName("UTF-8")), capacity, fpp);

        bloomFilters.put("key1");

        System.out.println("key1是否存在:" + bloomFilters.mightContain("key1"));
        System.out.println("key2是否存在:" + bloomFilters.mightContain("key2"));
    }
}

配合redis實現
1. 可以自己寫代碼利用redis bitmap數據結構顯示。可以參考guava裏面的計算hash次數和hash的想相關代碼，就guava裏的BitArray替換爲redis的bitmap即可。
2. 可以用lua腳本實現。參考：https://github.com/erikdubbelboer/redis-lua-scaling-bloom-filter
3. 利用redis 4.0+提供的插件功能實現。參考：https://blog.csdn.net/u013030276/article/details/88350641

四、適用業務場景

防止緩存穿透
去重，冪等處理等流程

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

布隆過濾器-BloomFilter

一、概述

二、詳解

三、實現

四、適用業務場景

03 使用numpy內置函數創建ndarray

mongo 從庫不同步數據

07 numpy 算數運算和廣播

布隆過濾器-BloomFilter

gradle 嚐鮮

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結