It is an space-efficient probabilistic DS that is based on the concept of hashing
It is used to efficiently check for the existence of an element in a set
If element does not exist it returns No else it returns Probable Yes (i.e. False Positives are possible)

Properties

  • Unlike a standard hash table, a Bloom filter of a fixed size can represent a set with an arbitrarily large number of elements.
  • Adding an element never fails. However, the false positive rate increases steadily as elements are added until all bits in the filter are set to 1, at which point all queries yield a positive result.
  • Bloom filters never generate false negative result
  • Deleting elements from filter is not possible because, if we delete a single element by clearing bits at indices generated by k hash functions, it might cause deletion of few other elements.

Working

The Hash Functions selected should be fast, should evenly and randomly distribute the data in the set, collisions are okay provided they are rare

Inserting Element

Bloom Filter in an Bit Array of an m bits where all the elements are set to 0 initially
Making use of k Hash functions we calculate the hash values for the inputs
h1("David") = 1
h2("David") = 4
h3("David") = 7
The bits 1, 4 and 7 are set to 1 in the Bit Array. We repeats the same process for all the values added in the Bit Array

Checking Element

For checking the element we perform the same steps but in reverse order
We calculate the hash for the values for the word to be search if the Bit for all the hashes are set to 1 we can say that the element could be present in the set
If even one of the hash values is 0 then with 100% guarantee we can say that the element is not present in the set

INFO

  • We cannot guarantee that the element is present with certainty is due to the fact that it’s possible that there is an word whose hashes sets the same bits that are already set as 1
  • It is a trade off between space used and accuracy

Time and Space Complexity

OperationComplexity
InsertionO(k)
SearchO(k)
SpaceO(m)

Applications

Medium uses bloom filters for recommending post to users by filtering post which have already been seen by user
Quora implemented a shared bloom filter in the feed backend to filter out stories that people have seen before
Google Chrome web browser used to use a Bloom filter to identify malicious URLs
Google Bigtable, Apache HBase and Apache Cassandra, and PostgreSQL use Bloom filters to reduce the disk lookups for non-existent rows or columns

Bloom Filter | Brilliant Math & Science Wiki

Bloom Filters - Introduction and Implementation - GeeksforGeeks