A Bloom filter is a space-efficient data structure used for probabilistic set membership testing. When testing an object for set membership, a Bloom filter may give a false positive. The analysis of the false positive rate is key to understanding the Bloom filter and applications that use it. We show experimentally that the classic analysis for false positive rate is wrong. We formally derive a correct formula using a balls-and-bins model and show how to numerically compute the new, correct formula in a stable manner. We also prove that the new formula always results in a predicted greater false positive rate than the classic formula. This correct formula is numerically compared to the classic formula for relative error for a small Bloom filter the relative error is shown to be significant.
Citation: Information Processing Letters
Pub Type: Journals
data structures, analysis of algorithms, Bloom filters