Skip to content

perf: use ISO_8859_1 for ASCII fast path in ClassFile.utf()#66

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intomainfrom
brian.marks/opt-classfile
Apr 9, 2026
Merged

perf: use ISO_8859_1 for ASCII fast path in ClassFile.utf()#66
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intomainfrom
brian.marks/opt-classfile

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented Apr 9, 2026

What Does This Do

Use ISO_8859_1 instead of US_ASCII for the ASCII fast path in ClassFile.utf() to improve string decoding performance.

new String(byte[], offset, length, ISO_8859_1) is faster than US_ASCII on all Java versions:

  • Java 8: The ISO_8859_1 decoder is a simple 1:1 byte-to-char copy, whereas the US_ASCII decoder validates each byte against the 0x00–0x7F range before copying.
  • Java 9+: The JVM can additionally adopt the byte array directly as the compact string encoding, avoiding any allocation.

Since the code already confirms all bytes are ASCII before reaching this fast path, using ISO_8859_1 is safe and produces identical results for bytes 0x00–0x7F.

Motivation

Benchmarking ClassFile.header() and ClassFile.outline() against spring-web.jar (~700 classes) showed measurable improvement:

Benchmark Baseline (us/op) After (us/op) Change
testClassHeader 782.7 ± 30.8 764.2 ± 97.9 -2.4%
testClassOutline 1989.4 ± 34.2 1879.5 ± 40.9 -5.5%

The outline improvement is larger because utf() is called for every method name, field name, and descriptor — not just class/super/interface names.

ASM baselines (unchanged code) confirmed system conditions were comparable across runs.

Additional Notes

  • Only the utf() fast path is changed (1 line)
  • Non-ASCII fallback path is unchanged
  • All existing tests pass

Contributor Checklist

Jira ticket: N/A — performance optimization, no ticket

🤖 Generated with Claude Code

On Java 9+, new String(byte[], offset, length, ISO_8859_1) allows the
JVM to adopt the byte array directly as the compact string encoding,
avoiding the byte-by-byte validation step that US_ASCII requires.

Since class-file UTF8 entries are almost always pure ASCII, and the code
already confirms this with a scan before reaching the fast path, using
ISO_8859_1 is safe and equivalent — both charsets produce identical
results for bytes 0x00–0x7F.

Benchmark results (class-match module, spring-web.jar dataset):

Baseline:
  ClassFileBenchmark.testClassHeader   avgt  5   782.666 ± 30.761  us/op
  ClassFileBenchmark.testClassOutline  avgt  5  1989.423 ± 34.237  us/op

After:
  ClassFileBenchmark.testClassHeader   avgt  5   764.186 ± 97.926  us/op
  ClassFileBenchmark.testClassOutline  avgt  5  1879.470 ± 40.856  us/op

~2.4% header improvement, ~5.5% outline improvement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bm1549 bm1549 marked this pull request as ready for review April 9, 2026 15:16
@bm1549 bm1549 requested a review from a team as a code owner April 9, 2026 15:16
@bm1549 bm1549 requested a review from mhlidd April 9, 2026 15:16
@mcculls mcculls self-requested a review April 9, 2026 15:17
Copy link
Copy Markdown
Collaborator

@mcculls mcculls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot merged commit 23a05b2 into main Apr 9, 2026
9 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot deleted the brian.marks/opt-classfile branch April 9, 2026 15:31
@github-actions github-actions bot added this to the 0.1.0 milestone Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants