Preload scope classes to prevent virtual thread deadlock#11111
Preload scope classes to prevent virtual thread deadlock#11111PerfectSlayer wants to merge 2 commits intomasterfrom
Conversation
3cba921 to
7b8e62f
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 57 metrics, 14 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057234
Total [baseline] (11.106 s) : 0, 11106302
Agent [candidate] (1.065 s) : 0, 1064806
Total [candidate] (11.118 s) : 0, 11117921
section appsec
Agent [baseline] (1.258 s) : 0, 1257895
Total [baseline] (11.165 s) : 0, 11164907
Agent [candidate] (1.248 s) : 0, 1247836
Total [candidate] (11.183 s) : 0, 11183135
section iast
Agent [baseline] (1.221 s) : 0, 1221313
Total [baseline] (11.237 s) : 0, 11237305
Agent [candidate] (1.221 s) : 0, 1221296
Total [candidate] (11.348 s) : 0, 11348041
section profiling
Agent [baseline] (1.184 s) : 0, 1184472
Total [baseline] (11.088 s) : 0, 11087789
Agent [candidate] (1.191 s) : 0, 1191108
Total [candidate] (11.074 s) : 0, 11073912
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.24 ms) : 0, 1240
crashtracking [candidate] (1.225 ms) : 0, 1225
BytebuddyAgent [baseline] (633.618 ms) : 0, 633618
BytebuddyAgent [candidate] (638.547 ms) : 0, 638547
AgentMeter [baseline] (29.338 ms) : 0, 29338
AgentMeter [candidate] (29.536 ms) : 0, 29536
GlobalTracer [baseline] (248.413 ms) : 0, 248413
GlobalTracer [candidate] (250.356 ms) : 0, 250356
AppSec [baseline] (32.4 ms) : 0, 32400
AppSec [candidate] (32.491 ms) : 0, 32491
Debugger [baseline] (59.952 ms) : 0, 59952
Debugger [candidate] (60.234 ms) : 0, 60234
Remote Config [baseline] (598.337 µs) : 0, 598
Remote Config [candidate] (591.829 µs) : 0, 592
Telemetry [baseline] (8.077 ms) : 0, 8077
Telemetry [candidate] (8.112 ms) : 0, 8112
Flare Poller [baseline] (7.509 ms) : 0, 7509
Flare Poller [candidate] (7.415 ms) : 0, 7415
section appsec
crashtracking [baseline] (1.229 ms) : 0, 1229
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (667.176 ms) : 0, 667176
BytebuddyAgent [candidate] (661.663 ms) : 0, 661663
AgentMeter [baseline] (12.19 ms) : 0, 12190
AgentMeter [candidate] (12.08 ms) : 0, 12080
GlobalTracer [baseline] (250.636 ms) : 0, 250636
GlobalTracer [candidate] (248.736 ms) : 0, 248736
IAST [baseline] (24.711 ms) : 0, 24711
IAST [candidate] (24.504 ms) : 0, 24504
AppSec [baseline] (186.341 ms) : 0, 186341
AppSec [candidate] (185.002 ms) : 0, 185002
Debugger [baseline] (66.35 ms) : 0, 66350
Debugger [candidate] (65.779 ms) : 0, 65779
Remote Config [baseline] (625.737 µs) : 0, 626
Remote Config [candidate] (612.583 µs) : 0, 613
Telemetry [baseline] (8.488 ms) : 0, 8488
Telemetry [candidate] (8.422 ms) : 0, 8422
Flare Poller [baseline] (3.574 ms) : 0, 3574
Flare Poller [candidate] (3.557 ms) : 0, 3557
section iast
crashtracking [baseline] (1.214 ms) : 0, 1214
crashtracking [candidate] (1.21 ms) : 0, 1210
BytebuddyAgent [baseline] (799.638 ms) : 0, 799638
BytebuddyAgent [candidate] (799.086 ms) : 0, 799086
AgentMeter [baseline] (11.342 ms) : 0, 11342
AgentMeter [candidate] (11.346 ms) : 0, 11346
GlobalTracer [baseline] (238.441 ms) : 0, 238441
GlobalTracer [candidate] (238.52 ms) : 0, 238520
IAST [baseline] (26.564 ms) : 0, 26564
IAST [candidate] (25.712 ms) : 0, 25712
AppSec [baseline] (31.084 ms) : 0, 31084
AppSec [candidate] (31.962 ms) : 0, 31962
Debugger [baseline] (59.527 ms) : 0, 59527
Debugger [candidate] (61.194 ms) : 0, 61194
Remote Config [baseline] (528.081 µs) : 0, 528
Remote Config [candidate] (1.155 ms) : 0, 1155
Telemetry [baseline] (13.482 ms) : 0, 13482
Telemetry [candidate] (11.606 ms) : 0, 11606
Flare Poller [baseline] (3.419 ms) : 0, 3419
Flare Poller [candidate] (3.505 ms) : 0, 3505
section profiling
crashtracking [baseline] (1.185 ms) : 0, 1185
crashtracking [candidate] (1.18 ms) : 0, 1180
BytebuddyAgent [baseline] (691.411 ms) : 0, 691411
BytebuddyAgent [candidate] (696.12 ms) : 0, 696120
AgentMeter [baseline] (9.085 ms) : 0, 9085
AgentMeter [candidate] (9.12 ms) : 0, 9120
GlobalTracer [baseline] (207.332 ms) : 0, 207332
GlobalTracer [candidate] (208.488 ms) : 0, 208488
AppSec [baseline] (32.78 ms) : 0, 32780
AppSec [candidate] (33.046 ms) : 0, 33046
Debugger [baseline] (65.641 ms) : 0, 65641
Debugger [candidate] (65.683 ms) : 0, 65683
Remote Config [baseline] (576.368 µs) : 0, 576
Remote Config [candidate] (602.44 µs) : 0, 602
Telemetry [baseline] (7.839 ms) : 0, 7839
Telemetry [candidate] (7.925 ms) : 0, 7925
Flare Poller [baseline] (3.575 ms) : 0, 3575
Flare Poller [candidate] (3.567 ms) : 0, 3567
ProfilingAgent [baseline] (93.87 ms) : 0, 93870
ProfilingAgent [candidate] (93.964 ms) : 0, 93964
Profiling [baseline] (94.448 ms) : 0, 94448
Profiling [candidate] (94.531 ms) : 0, 94531
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.066 s) : 0, 1065798
Total [baseline] (8.84 s) : 0, 8839836
Agent [candidate] (1.055 s) : 0, 1054685
Total [candidate] (8.82 s) : 0, 8819996
section iast
Agent [baseline] (1.223 s) : 0, 1223472
Total [baseline] (9.558 s) : 0, 9558415
Agent [candidate] (1.222 s) : 0, 1222271
Total [candidate] (9.569 s) : 0, 9568659
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.237 ms) : 0, 1237
crashtracking [candidate] (1.22 ms) : 0, 1220
BytebuddyAgent [baseline] (638.694 ms) : 0, 638694
BytebuddyAgent [candidate] (631.822 ms) : 0, 631822
AgentMeter [baseline] (29.621 ms) : 0, 29621
AgentMeter [candidate] (29.289 ms) : 0, 29289
GlobalTracer [baseline] (250.256 ms) : 0, 250256
GlobalTracer [candidate] (248.375 ms) : 0, 248375
AppSec [baseline] (32.828 ms) : 0, 32828
AppSec [candidate] (32.297 ms) : 0, 32297
Debugger [baseline] (59.871 ms) : 0, 59871
Debugger [candidate] (58.796 ms) : 0, 58796
Remote Config [baseline] (596.571 µs) : 0, 597
Remote Config [candidate] (585.179 µs) : 0, 585
Telemetry [baseline] (8.177 ms) : 0, 8177
Telemetry [candidate] (8.023 ms) : 0, 8023
Flare Poller [baseline] (8.288 ms) : 0, 8288
Flare Poller [candidate] (8.239 ms) : 0, 8239
section iast
crashtracking [baseline] (1.232 ms) : 0, 1232
crashtracking [candidate] (1.242 ms) : 0, 1242
BytebuddyAgent [baseline] (801.43 ms) : 0, 801430
BytebuddyAgent [candidate] (800.287 ms) : 0, 800287
AgentMeter [baseline] (11.455 ms) : 0, 11455
AgentMeter [candidate] (11.392 ms) : 0, 11392
GlobalTracer [baseline] (238.611 ms) : 0, 238611
GlobalTracer [candidate] (238.805 ms) : 0, 238805
AppSec [baseline] (31.393 ms) : 0, 31393
AppSec [candidate] (31.243 ms) : 0, 31243
Debugger [baseline] (59.625 ms) : 0, 59625
Debugger [candidate] (61.677 ms) : 0, 61677
Remote Config [baseline] (533.334 µs) : 0, 533
Remote Config [candidate] (544.283 µs) : 0, 544
Telemetry [baseline] (13.047 ms) : 0, 13047
Telemetry [candidate] (11.79 ms) : 0, 11790
Flare Poller [baseline] (3.435 ms) : 0, 3435
Flare Poller [candidate] (3.385 ms) : 0, 3385
IAST [baseline] (26.683 ms) : 0, 26683
IAST [candidate] (25.769 ms) : 0, 25769
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 19 metrics, 16 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section baseline
no_agent (1.245 ms) : 1232, 1257
. : milestone, 1245,
iast (3.318 ms) : 3271, 3365
. : milestone, 3318,
iast_FULL (6.159 ms) : 6096, 6222
. : milestone, 6159,
iast_GLOBAL (3.776 ms) : 3713, 3840
. : milestone, 3776,
profiling (2.216 ms) : 2196, 2237
. : milestone, 2216,
tracing (1.916 ms) : 1900, 1932
. : milestone, 1916,
section candidate
no_agent (1.237 ms) : 1225, 1250
. : milestone, 1237,
iast (3.272 ms) : 3228, 3316
. : milestone, 3272,
iast_FULL (6.226 ms) : 6161, 6291
. : milestone, 6226,
iast_GLOBAL (3.698 ms) : 3635, 3761
. : milestone, 3698,
profiling (2.191 ms) : 2172, 2209
. : milestone, 2191,
tracing (1.883 ms) : 1866, 1899
. : milestone, 1883,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section baseline
no_agent (18.46 ms) : 18274, 18646
. : milestone, 18460,
appsec (18.633 ms) : 18443, 18823
. : milestone, 18633,
code_origins (18.033 ms) : 17854, 18211
. : milestone, 18033,
iast (18.25 ms) : 18065, 18434
. : milestone, 18250,
profiling (19.16 ms) : 18963, 19356
. : milestone, 19160,
tracing (17.949 ms) : 17775, 18123
. : milestone, 17949,
section candidate
no_agent (18.207 ms) : 18021, 18392
. : milestone, 18207,
appsec (18.986 ms) : 18794, 19178
. : milestone, 18986,
code_origins (18.088 ms) : 17909, 18267
. : milestone, 18088,
iast (18.151 ms) : 17972, 18330
. : milestone, 18151,
profiling (18.455 ms) : 18273, 18636
. : milestone, 18455,
tracing (17.728 ms) : 17555, 17900
. : milestone, 17728,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section baseline
no_agent (15.005 s) : 15005000, 15005000
. : milestone, 15005000,
appsec (14.592 s) : 14592000, 14592000
. : milestone, 14592000,
iast (18.337 s) : 18337000, 18337000
. : milestone, 18337000,
iast_GLOBAL (18.354 s) : 18354000, 18354000
. : milestone, 18354000,
profiling (14.858 s) : 14858000, 14858000
. : milestone, 14858000,
tracing (15.14 s) : 15140000, 15140000
. : milestone, 15140000,
section candidate
no_agent (14.799 s) : 14799000, 14799000
. : milestone, 14799000,
appsec (14.8 s) : 14800000, 14800000
. : milestone, 14800000,
iast (18.372 s) : 18372000, 18372000
. : milestone, 18372000,
iast_GLOBAL (18.283 s) : 18283000, 18283000
. : milestone, 18283000,
profiling (14.594 s) : 14594000, 14594000
. : milestone, 14594000,
tracing (15.097 s) : 15097000, 15097000
. : milestone, 15097000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~5115362274, baseline=1.62.0-SNAPSHOT~1ee99fb92c
dateFormat X
axisFormat %s
section baseline
no_agent (1.483 ms) : 1472, 1495
. : milestone, 1483,
appsec (3.841 ms) : 3616, 4065
. : milestone, 3841,
iast (2.288 ms) : 2218, 2358
. : milestone, 2288,
iast_GLOBAL (2.321 ms) : 2250, 2392
. : milestone, 2321,
profiling (2.104 ms) : 2049, 2160
. : milestone, 2104,
tracing (2.093 ms) : 2038, 2148
. : milestone, 2093,
section candidate
no_agent (1.483 ms) : 1472, 1495
. : milestone, 1483,
appsec (3.838 ms) : 3614, 4061
. : milestone, 3838,
iast (2.273 ms) : 2203, 2343
. : milestone, 2273,
iast_GLOBAL (2.331 ms) : 2260, 2402
. : milestone, 2331,
profiling (2.124 ms) : 2066, 2181
. : milestone, 2124,
tracing (2.079 ms) : 2024, 2133
. : milestone, 2079,
|
|
|
||
| // Preload classes used by swap() to avoid class loading on the virtual thread mount path. | ||
| // DatadogClassLoader loads these from a JarFile using synchronized I/O, which pins | ||
| // virtual thread carrier threads and can deadlock the application. | ||
| ScopeContext.class.getName(); | ||
| ScopeStack.class.getName(); |
There was a problem hiding this comment.
Would it make sense to also have a test reproducer for such deadlock as part of PR?
There was a problem hiding this comment.
I have a fork test that checks the loaded classes before and after context swap but that’s very brittle.
I would like to figure out if this change fully fixes the issue first before trying to build testing for.
But yes, we need it to avoid the regression from #10273 (which does not have test 😞 )
There was a problem hiding this comment.
I'm wondering if we should have a more general preload mechanism. Although, even if we do, I think we should leave that for another PR.
There was a problem hiding this comment.
Also, ScopeStack and ScopeContext (a wrapper for ScopeStack) should be removed at some point when we will move away from scope stack to context pointers...
So I don’t think this fix will be future proof and testing is definitely needed as it might break later with @mcculls changes on the scope manager.
There was a problem hiding this comment.
EDIT: currently this is specific to IAST instrumentations, but we can make this a generic feature...
You can list classes to be preloaded in the instrumentation class:
@Override
public String[] getClassNamesToBePreloaded() {
return SOME_CLASSES_TO_BE_PRELOADED;
}
at the moment you'd need to maintain this list manually, but we could add a build plugin that scanned all the classes used by the instrumentation (and their dependent classes) and construct the list at build time - much like we're thinking of building the list of helper classes.
There was a problem hiding this comment.
I was coming to comment about the IAST support only.
If you’re telling me that moving it to a more generic feature (for InstrumenterModule), I can do the refactoring and base the patch on the feature.
There was a problem hiding this comment.
Yes, it should just be a matter of moving that feature up to InstrumenterModule from InstrumenterModule.Iast
(This could be done as a follow-up PR if you want to prioritise merging this fix)
There was a problem hiding this comment.
Alright, on it!
Done. Rewrote the history as the original fix implementation was scrapped.
There was a problem hiding this comment.
Opened for review. About testing, I'm still waiting for a reproducer so I could integrate it as part of our testing suite.
7b8e62f to
353b58e
Compare
…deadlock Preload ScopeContext and ScopeStack classes in ContinuableScopeManager constructor to avoid class loading on virtual thread mount path. DatadogClassLoader's synchronized I/O from JarFile can pin carrier threads and deadlock the application.
353b58e to
5115362
Compare
AlexeyKuznetsov-DD
left a comment
There was a problem hiding this comment.
LGTM, left very minor comments.
Hope it will be possible to cover with tests.
| } | ||
|
|
||
| public List<Instrumenter> typeInstrumentations() { | ||
| preloadClassNames(); |
There was a problem hiding this comment.
nit: I think method name should be preloadClasses()? As preloadClassNames() sounds like - preload some list of class names to me.
There was a problem hiding this comment.
makes sense to improve the name while refactoring - preloadClasses is a good choice
| } | ||
|
|
||
| /** Get classes to force load */ | ||
| protected String[] getClassNamesToBePreloaded() { |
There was a problem hiding this comment.
nit: same here, I think classesToPreload() or similar would be a better name and a bit more consistent with already existing names in this class (like adviceShading() for example).
| // Preload classes used by Context.swap() to avoid class loading on the virtual thread mount path. | ||
| // DatadogClassLoader loads these from a JarFile using synchronized I/O, which pins | ||
| // virtual thread carrier threads and can deadlock the application. | ||
| private static final String[] PRELOAD_CLASSES = { |
There was a problem hiding this comment.
nit: CLASSES_TO_PRELOAD ?
There was a problem hiding this comment.
The original name was FORCE_LOADING from
I will redo a pass for better naming / consistency then.
mcculls
left a comment
There was a problem hiding this comment.
Similar comments to Alexey around improving on the old names.
Wrt. testing - I'm ok with this being merged based on dogfooding feedback (working on a test can continue in parallel)
Great, I will update it then. I did not want to change it 1. as part of a refactoring for a fix, 2. not as the original author. |
What Does This Do
This PR preloads ScopeContext and ScopeStack classes in ContinuableScopeManager constructor to avoid class loading on virtual thread mount path. DatadogClassLoader's synchronized I/O from JarFile can pin carrier threads and deadlock the application.
Motivation
Related to #10273 but the API (so the classes loaded on hot path) changed when re-implementing the virtual thread instrumentation in #11009
Additional Notes
Sharing the CI dev build to help with investigation
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: APMS-19222 [APMS-19230]
GitHub ticket: #11103
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.