feat/perf: optimize AddRange in RoaringPositionBitmap by Baunsgaard · Pull Request #608 · apache/iceberg-cpp

Baunsgaard · 2026-03-29T21:25:56Z

Replace the per-position loop in RoaringPositionBitmap::AddRange with direct calls to roaring::Roaring::addRange, matching the optimization in the Java implementation (apache/iceberg#15791).

Added tests for single-position ranges, large contiguous ranges, three-key spanning ranges, and invalid inputs.

Replace the per-position loop in AddRange with direct roaring::Roaring::addRange calls. For ranges within a single 32-bit key, a single addRange call suffices. For ranges spanning multiple keys, the range is split into the first partial key, full middle keys, and the last partial key.

AddRange() now creates run-length encoded containers directly, leaving nothing for Optimize() to compress. Use individual Add() calls to produce array containers that Optimize() can convert.

evindj · 2026-03-30T14:03:52Z

src/iceberg/deletes/roaring_position_bitmap.h

+  /// If pos_start >= pos_end, this method does nothing.
  /// \note Invalid positions are silently ignored


rephrase may be state pos_start >= pos_end is considered invalid.

I have made it invalid if pos_start > pos_end, technically I can imagine cases where pos_start == pos_end, therefore i keep that as a no-op.

Let me know what you think.

src/iceberg/deletes/roaring_position_bitmap.cc

src/iceberg/test/roaring_position_bitmap_test.cc

emkornfield · 2026-03-30T16:09:58Z

src/iceberg/deletes/roaring_position_bitmap.cc

+  if (pos_start >= pos_end) {
+    return;
+  }
+  if (pos_start < 0 || pos_end - 1 > kMaxPosition) {


its not clear to me this is correct, don't we need to fill in valid values?

+1

We need to ignore invalid positions but set values for valid ones.

Okay, i follow the style of the normal operation, to ignore invalid positions, and clamp inputs to correct values.

However, this is inconsistent with the behaviour of core Java, where positions gets validated, and errors out if they are out of range.

src/iceberg/deletes/roaring_position_bitmap.cc

wgtmac · 2026-03-31T09:12:48Z

src/iceberg/deletes/roaring_position_bitmap.cc

+  if (pos_start >= pos_end) {
+    return;
+  }
+  if (pos_start < 0 || pos_end - 1 > kMaxPosition) {


+1

We need to ignore invalid positions but set values for valid ones.

wgtmac · 2026-03-31T09:14:20Z

src/iceberg/deletes/roaring_position_bitmap.h

  void Add(int64_t pos);

  /// \brief Sets a range of positions [pos_start, pos_end).
+  /// If pos_start >= pos_end, this method does nothing.


This contradicts the note sentence below.

Combine the empty-range check and the bounds check into one conditional to reduce nesting and improve readability. No behavior change.

When the requested range extends beyond valid bounds [0, kMaxPosition], clamp to the valid portion and set those positions. This is consistent with Add() which silently ignores individual invalid positions -- a partially-valid range should still set its valid subset. Update tests to verify clamping behavior for negative start and beyond-max-position end values.

Replace the branching if/else (single-key vs multi-key) with a unified loop from start_key to end_key. The boundary conditions for the first and last key are handled inline with ternary expressions. This is easier to maintain and reason about.

The previous doc said "does nothing" for empty ranges while the note said "Invalid positions are silently ignored", which was contradictory. Rewrite to document the clamping parameters and clarify that out-of-bound positions are silently ignored.

Replace individual TEST() functions for reversed-range and equal-start-end cases with a parameterized test suite. Add additional edge cases (zero-length at zero, both-negative) for broader coverage.

A reversed range (start > end) indicates a bug in the caller and should fail fast with std::invalid_argument rather than silently succeed. An empty range (start == end) remains a valid no-op. This makes AddRange behavior consistent across the Java, C++, and Rust Iceberg implementations.

Adjust line breaks and alignment to satisfy clang-format.

Baunsgaard · 2026-03-31T12:10:10Z

Thanks for the reviews! Addressed all feedback, let me know if i missed anything or any changes are required.

As a reminder the related PRs are:

Java: Core: Optimize RoaringPositionBitmap.setRange with native range API iceberg#15791
Rust: feat/perf: add insert_range and contains to DeleteVector iceberg-rust#2292

Benchmark results

I was asked for Benchmark results so i thought i would post some on each PR.

AddRange() (native CRoaring addRange) vs a loop of individual
Add() calls (Release build, GCC 14, 5 iterations after 3 warmup):

Scenario	Before (Add loop)	After (AddRange)	Speedup
100 positions	1.1 µs	73.6 ns	15x
10k positions	91.1 µs	70.6 ns	1,290x
200k positions	1.60 ms	188.2 ns	8,475x
1M positions	8.80 ms	634.2 ns	13,880x
10k cross-key boundary	97.3 µs	102.4 ns	951x

The cross-key boundary scenario inserts 10k positions spanning the 32-bit key boundary (e.g., (1LL << 32) - 5000 to (1LL << 32) + 5000). This exercises the key-splitting logic where the range is divided across two internal 32-bit Roaring bitmaps.

The old test called AddRange near kMaxPosition (0x7FFFFFFE80000000), which required allocating ~2 billion internal bitmap slots and took ~180 seconds. CI killed the util_test binary before it finished. Replace with a range entirely beyond kMaxPosition so clamping makes the range empty — no allocation needed, completes in < 1 ms.

The pos_start == pos_end early return is unnecessary because the post-clamp guard (pos_start >= pos_end) already handles it. After the throw for reversed ranges, clamping can only shrink the range, so the only empty case is equality.

wgtmac · 2026-03-31T14:12:19Z

src/iceberg/deletes/roaring_position_bitmap.cc

-  for (int64_t pos = pos_start; pos < pos_end; ++pos) {
-    Add(pos);
+  if (pos_start > pos_end) {
+    throw std::invalid_argument("AddRange requires pos_start <= pos_end, got [" +


Let's return Status instead of throwing exception if you really think this is an invalid case. IMHO, we need to be consistent with line 123 and line 140 to return on invalid inputs.

Well, it is an invalid input, however for consistency with other methods in the c++ implementation, then both versions are valid.

Before my changes in the PR to the Java base, it also would not throw any exceptions. since it used the same for loop definition. To keep the existing behaviour, I have removed the exception throwing.

Replace the std::invalid_argument throw with a silent return on reversed ranges (pos_start > pos_end), consistent with how Add() handles invalid positions. The clamping already reduces the range to empty when start > end after clamping, so the explicit throw is removed entirely.

Baunsgaard added 2 commits March 29, 2026 13:15

Fix TestOptimize to use Add() instead of AddRange()

520a868

AddRange() now creates run-length encoded containers directly, leaving nothing for Optimize() to compress. Use individual Add() calls to produce array containers that Optimize() can convert.

evindj reviewed Mar 30, 2026

View reviewed changes

emkornfield reviewed Mar 30, 2026

View reviewed changes

src/iceberg/deletes/roaring_position_bitmap.cc Outdated Show resolved Hide resolved

wgtmac reviewed Mar 31, 2026

View reviewed changes

Baunsgaard added 7 commits March 31, 2026 09:34

Merge AddRange guard clauses into a single if block

1f70485

Combine the empty-range check and the bounds check into one conditional to reduce nesting and improve readability. No behavior change.

Parameterize AddRange no-op tests

ab7e189

Replace individual TEST() functions for reversed-range and equal-start-end cases with a parameterized test suite. Add additional edge cases (zero-length at zero, both-negative) for broader coverage.

Fix clang-format violations

3c55027

Adjust line breaks and alignment to satisfy clang-format.

Baunsgaard mentioned this pull request Mar 31, 2026

Core: Optimize RoaringPositionBitmap.setRange with native range API apache/iceberg#15791

Open

Baunsgaard requested review from emkornfield, evindj and wgtmac March 31, 2026 12:22

wgtmac reviewed Mar 31, 2026

View reviewed changes

Baunsgaard mentioned this pull request Mar 31, 2026

feat/perf: add insert_range and contains to DeleteVector apache/iceberg-rust#2292

Open

Baunsgaard added 2 commits March 31, 2026 17:32

Extract pos_last variable to avoid repeated pos_end - 1

4421d7d

Baunsgaard requested a review from wgtmac March 31, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/perf: optimize AddRange in RoaringPositionBitmap#608

feat/perf: optimize AddRange in RoaringPositionBitmap#608
Baunsgaard wants to merge 13 commits intoapache:mainfrom
Baunsgaard:optimize-addrange-roaring-bitmap

Baunsgaard commented Mar 29, 2026

Uh oh!

evindj Mar 30, 2026

Uh oh!

Baunsgaard Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

emkornfield Mar 30, 2026

Uh oh!

wgtmac Mar 31, 2026

Uh oh!

Baunsgaard Mar 31, 2026

Uh oh!

Uh oh!

wgtmac Mar 31, 2026

Uh oh!

wgtmac Mar 31, 2026

Uh oh!

Baunsgaard commented Mar 31, 2026

Uh oh!

wgtmac Mar 31, 2026

Uh oh!

Baunsgaard Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		/// If pos_start >= pos_end, this method does nothing.
		/// \note Invalid positions are silently ignored

Conversation

Baunsgaard commented Mar 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Baunsgaard commented Mar 31, 2026

Benchmark results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants