Skip to content

Optimize nettrace-to-TraceLog Conversion#2403

Open
brianrob wants to merge 7 commits intomicrosoft:mainfrom
brianrob:brianrob/universal-to-tracelog-perf
Open

Optimize nettrace-to-TraceLog Conversion#2403
brianrob wants to merge 7 commits intomicrosoft:mainfrom
brianrob:brianrob/universal-to-tracelog-perf

Conversation

@brianrob
Copy link
Copy Markdown
Member

@brianrob brianrob commented Apr 1, 2026

Summary

Optimizes the nettrace-to-TraceLog/ETLX conversion pipeline, reducing conversion time for a 1.68 GB nettrace file from ~22 minutes to ~40 seconds (34x speedup).

Problem

Converting large nettrace files (e.g., from OneCollect Linux traces with many threads) to TraceLog/ETLX format was extremely slow. Profiling revealed three bottlenecks:

  1. EventCache.SortAndDispatch consumed ~75% of CPU — O(N×T) linear scan over all thread queues per event block, plus per-call LINQ List allocations.
  2. ParsedSymbolMetadata consumed ~10% of CPU — JSON deserialization repeated on every property access.
  3. FileName string allocations consumed ~12% of CPU — GetShortUTF8StringAt() called 3 times per mapping event.

Changes

EventCache min-heap (EventCache.cs)

  • Replaced the O(N×T) linear scan in SortAndDispatch with an O(N×log T) min-heap merge.
  • Introduced a generic MinHeap<TValue> class with full unit test coverage (13 tests).
  • Added _activeThreadQueues HashSet to track only threads with pending events, avoiding iteration over the full thread dictionary.

Cache ParsedSymbolMetadata (UniversalSystemTraceEventParser.cs)

  • Cached the result of ProcessMappingSymbolMetadataParser.TryParse() to avoid repeated JSON deserialization.
  • Reset cache in Dispatch() to prevent stale data when TraceEvent objects are reused.
  • Added Clone() override to preserve cached values in cloned events.

Reduce FileName allocations (NettraceUniversalConverter.cs)

  • Read data.FileName once into a local variable instead of accessing the property 3 times (each access allocates a new string via GetShortUTF8StringAt).
  • Call UniversalMapping(string, ...) overload directly, passing the cached string.

Performance Results

Measured with a 1.68 GB nettrace file (8.6M events, 11.5K processes) using Release builds:

Metric Before After
Conversion time 22:26 (1346s) 00:40 (40s)
Speedup 34×
Events 8,583,800 8,583,800 ✓
Processes 11,509 11,509 ✓
Unique code addresses 174,476 174,476 ✓
Unique stacks 206,467 206,467 ✓
ETLX file size 3,051,574,460 3,051,574,460 ✓

All semantic statistics match. ETLX files are the same size but not byte-identical due to different tie-breaking order for same-timestamp events (min-heap vs linear scan).

Testing

  • All existing tests pass (2,274 on net8.0 + 2,289 on net462).
  • 13 new MinHeap unit tests added.

brianrob and others added 7 commits March 31, 2026 20:33
Replace the O(N*T) linear-scan merge in SortAndDispatch with an O(N*log(T))
min-heap merge, where N is the number of events and T is the number of
threads. The previous implementation rebuilt a List from LINQ on every call
and linearly scanned all thread queues for the minimum timestamp per event.

The new implementation uses an array-backed binary min-heap keyed by
timestamp. After extracting the minimum, only a single O(log T) sift-down
is needed to restore the heap property. The heap list is reused across
calls to avoid per-call allocations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Maintain a HashSet of thread queues that have pending events instead of
iterating all threads in the dictionary on every SortAndDispatch call.
Queues are added to the active set when their first event is enqueued
and removed when drained. This eliminates the Dictionary.Values
enumeration which was ~28% of CPU during nettrace-to-TraceLog conversion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cache the result of ProcessMappingSymbolMetadataParser.TryParse() in
ProcessMappingMetadataTraceData so that repeated accesses to the
ParsedSymbolMetadata property do not re-invoke JSON deserialization.
The property is accessed twice per mapping event (for PE and ELF
metadata checks), and the metadata objects are shared across multiple
mappings via MetadataId. This was ~10% of CPU during conversion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read ProcessMappingTraceData.FileName once into a local variable and pass
it directly to UniversalMapping(string, ...) instead of going through the
UniversalMapping(ProcessMappingTraceData, ...) overload. Previously,
FileName was accessed 3 times per mapping event (IsNullOrEmpty check,
StartsWith check, and inside UniversalMapping), each time allocating a
new string via GetShortUTF8StringAt(). String allocation was ~12% of CPU
in Release-mode profiling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceEvent objects are reused across callbacks. The cached
ParsedSymbolMetadata fields were never cleared between dispatches,
which could return metadata from a previous event if the property
was accessed on the template object rather than a clone.

Reset _parsedSymbolMetadataCached and _parsedSymbolMetadata at the
start of Dispatch() before invoking the callback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor the min-heap helpers into a self-contained private MinHeap
class with XML doc comments on all public methods. Add comments
explaining the binary heap child index formulas (2i+1, 2i+2). Use
C# tuple swap syntax instead of a temp variable.

Add Clone() override to ProcessMappingMetadataTraceData to explicitly
copy the cached ParsedSymbolMetadata fields into the clone. Strings
are immutable so a shallow copy of the reference is sufficient.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make MinHeap generic (MinHeap<TValue>) and internal so it can be
tested from the test project. Add 13 unit tests covering: empty heap,
single element, ascending/descending/random input, duplicate keys,
ReplaceRoot, RemoveRoot, Clear, and mixed operations. Add a comment
to Build() explaining why iteration starts at Count/2-1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob brianrob marked this pull request as ready for review April 1, 2026 21:06
@brianrob brianrob requested a review from a team as a code owner April 1, 2026 21:06
@brianrob brianrob changed the title Optimize nettrace-to-TraceLog conversion (34x speedup) Optimize nettrace-to-TraceLog Conversion Apr 1, 2026
@brianrob
Copy link
Copy Markdown
Member Author

brianrob commented Apr 1, 2026

cc: @zachcmadsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants