Skip to content

PARQUET-41: Add Bloom filter#112

Merged
majetideepak merged 3 commits intoapache:bloom-filterfrom
chenjunjiedada:bloom-filter
Oct 12, 2018
Merged

PARQUET-41: Add Bloom filter#112
majetideepak merged 3 commits intoapache:bloom-filterfrom
chenjunjiedada:bloom-filter

Conversation

@chenjunjiedada
Copy link
Copy Markdown
Contributor

Move original PR to here and add doc.

@chenjunjiedada
Copy link
Copy Markdown
Contributor Author

Hi @rdblue ,

I had add a doc here, could you please help to review?

@jbapple-cloudera
Copy link
Copy Markdown

I had many suggested revisions to the Bloom filter prose, so I thought sending you, @cjjnjust, a pull request would be easier than using Github's weak code-review tool.

chenjunjiedada#1

Grammar and structure tweaking for Bloom filter prose.
@jbapple-cloudera
Copy link
Copy Markdown

ok, patch LGTM. +1

@jbapple-cloudera
Copy link
Copy Markdown

@majetideepak Can you take a look, too?

@majetideepak
Copy link
Copy Markdown

@jbapple-cloudera sure! I will make a pass by end of today. I have to catch up on the recent updates.

Comment thread BloomFilter.md
following formula. The output is in bits per distinct element:

```c
-8 / log(1 - pow(p, 1.0 / 8));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the source of this formula? I tried substituting 0.5% into this and it did not come out to 11.54. I probably missed something.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the classic formula for Bloom Filters. See the Network Applications paper at the bottom for a proof.

It's very sensitive to p, so 0.4% is closer, and 0.39% closer still.

Copy link
Copy Markdown

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@jbapple-cloudera
Copy link
Copy Markdown

jbapple-cloudera commented Oct 11, 2018

@majetideepak , you're the only committer to review so far. Do you want to merge this PR?

@majetideepak majetideepak merged commit 28b84d8 into apache:bloom-filter Oct 12, 2018
majetideepak added a commit that referenced this pull request Oct 12, 2018
majetideepak pushed a commit that referenced this pull request Oct 12, 2018
* PARQUET-41: Add Bloom filter

* Grammar and structure tweaking for Bloom filter prose.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants