PARQUET-41: Add Bloom filter#112
Conversation
|
Hi @rdblue , I had add a doc here, could you please help to review? |
|
I had many suggested revisions to the Bloom filter prose, so I thought sending you, @cjjnjust, a pull request would be easier than using Github's weak code-review tool. |
Grammar and structure tweaking for Bloom filter prose.
|
ok, patch LGTM. +1 |
|
@majetideepak Can you take a look, too? |
|
@jbapple-cloudera sure! I will make a pass by end of today. I have to catch up on the recent updates. |
| following formula. The output is in bits per distinct element: | ||
|
|
||
| ```c | ||
| -8 / log(1 - pow(p, 1.0 / 8)); |
There was a problem hiding this comment.
Can you point me to the source of this formula? I tried substituting 0.5% into this and it did not come out to 11.54. I probably missed something.
There was a problem hiding this comment.
This is the classic formula for Bloom Filters. See the Network Applications paper at the bottom for a proof.
It's very sensitive to p, so 0.4% is closer, and 0.39% closer still.
|
@majetideepak , you're the only committer to review so far. Do you want to merge this PR? |
This reverts commit 28b84d8.
* PARQUET-41: Add Bloom filter * Grammar and structure tweaking for Bloom filter prose.
Move original PR to here and add doc.