Optimize nettrace-to-TraceLog Conversion#2403
Open
brianrob wants to merge 7 commits intomicrosoft:mainfrom
Open
Optimize nettrace-to-TraceLog Conversion#2403brianrob wants to merge 7 commits intomicrosoft:mainfrom
brianrob wants to merge 7 commits intomicrosoft:mainfrom
Conversation
Replace the O(N*T) linear-scan merge in SortAndDispatch with an O(N*log(T)) min-heap merge, where N is the number of events and T is the number of threads. The previous implementation rebuilt a List from LINQ on every call and linearly scanned all thread queues for the minimum timestamp per event. The new implementation uses an array-backed binary min-heap keyed by timestamp. After extracting the minimum, only a single O(log T) sift-down is needed to restore the heap property. The heap list is reused across calls to avoid per-call allocations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Maintain a HashSet of thread queues that have pending events instead of iterating all threads in the dictionary on every SortAndDispatch call. Queues are added to the active set when their first event is enqueued and removed when drained. This eliminates the Dictionary.Values enumeration which was ~28% of CPU during nettrace-to-TraceLog conversion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cache the result of ProcessMappingSymbolMetadataParser.TryParse() in ProcessMappingMetadataTraceData so that repeated accesses to the ParsedSymbolMetadata property do not re-invoke JSON deserialization. The property is accessed twice per mapping event (for PE and ELF metadata checks), and the metadata objects are shared across multiple mappings via MetadataId. This was ~10% of CPU during conversion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read ProcessMappingTraceData.FileName once into a local variable and pass it directly to UniversalMapping(string, ...) instead of going through the UniversalMapping(ProcessMappingTraceData, ...) overload. Previously, FileName was accessed 3 times per mapping event (IsNullOrEmpty check, StartsWith check, and inside UniversalMapping), each time allocating a new string via GetShortUTF8StringAt(). String allocation was ~12% of CPU in Release-mode profiling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceEvent objects are reused across callbacks. The cached ParsedSymbolMetadata fields were never cleared between dispatches, which could return metadata from a previous event if the property was accessed on the template object rather than a clone. Reset _parsedSymbolMetadataCached and _parsedSymbolMetadata at the start of Dispatch() before invoking the callback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor the min-heap helpers into a self-contained private MinHeap class with XML doc comments on all public methods. Add comments explaining the binary heap child index formulas (2i+1, 2i+2). Use C# tuple swap syntax instead of a temp variable. Add Clone() override to ProcessMappingMetadataTraceData to explicitly copy the cached ParsedSymbolMetadata fields into the clone. Strings are immutable so a shallow copy of the reference is sufficient. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make MinHeap generic (MinHeap<TValue>) and internal so it can be tested from the test project. Add 13 unit tests covering: empty heap, single element, ascending/descending/random input, duplicate keys, ReplaceRoot, RemoveRoot, Clear, and mixed operations. Add a comment to Build() explaining why iteration starts at Count/2-1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
cc: @zachcmadsen |
leculver
approved these changes
Apr 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes the nettrace-to-TraceLog/ETLX conversion pipeline, reducing conversion time for a 1.68 GB nettrace file from ~22 minutes to ~40 seconds (34x speedup).
Problem
Converting large nettrace files (e.g., from OneCollect Linux traces with many threads) to TraceLog/ETLX format was extremely slow. Profiling revealed three bottlenecks:
EventCache.SortAndDispatchconsumed ~75% of CPU — O(N×T) linear scan over all thread queues per event block, plus per-call LINQListallocations.ParsedSymbolMetadataconsumed ~10% of CPU — JSON deserialization repeated on every property access.FileNamestring allocations consumed ~12% of CPU —GetShortUTF8StringAt()called 3 times per mapping event.Changes
EventCache min-heap (
EventCache.cs)SortAndDispatchwith an O(N×log T) min-heap merge.MinHeap<TValue>class with full unit test coverage (13 tests)._activeThreadQueuesHashSet to track only threads with pending events, avoiding iteration over the full thread dictionary.Cache ParsedSymbolMetadata (
UniversalSystemTraceEventParser.cs)ProcessMappingSymbolMetadataParser.TryParse()to avoid repeated JSON deserialization.Dispatch()to prevent stale data when TraceEvent objects are reused.Clone()override to preserve cached values in cloned events.Reduce FileName allocations (
NettraceUniversalConverter.cs)data.FileNameonce into a local variable instead of accessing the property 3 times (each access allocates a new string viaGetShortUTF8StringAt).UniversalMapping(string, ...)overload directly, passing the cached string.Performance Results
Measured with a 1.68 GB nettrace file (8.6M events, 11.5K processes) using Release builds:
All semantic statistics match. ETLX files are the same size but not byte-identical due to different tie-breaking order for same-timestamp events (min-heap vs linear scan).
Testing