Skip to content

fix(server): prevent segfault when server JS wrapper is finalized during request handling#28246

Open
robobun wants to merge 3 commits intomainfrom
claude/fix-22809-server-segfault-on-finalized-js-value
Open

fix(server): prevent segfault when server JS wrapper is finalized during request handling#28246
robobun wants to merge 3 commits intomainfrom
claude/fix-22809-server-segfault-on-finalized-js-value

Conversation

@robobun
Copy link
Collaborator

@robobun robobun commented Mar 18, 2026

Summary

  • Fix null pointer dereference (segfault at address 0x0) when HTTP requests arrive after server.stop() + GC collects the server's JS wrapper
  • Replace unsafe jsValueAssertAlive() (which used .? unwrap — UB when null in release builds) with safe tryGetJSValue() returning ?JSValue
  • All 8 call sites now handle the null case gracefully: uWS callbacks return 503, the fetch() path returns a rejected promise

Test plan

  • Added regression test in test/regression/issue/22809.test.ts
  • Existing server stop tests pass (bun bd test -t "should be able to await server.stop", bun bd test -t "should be able to abruptly stop")
  • Debug build compiles successfully

Fixes #22809

🤖 Generated with Claude Code

…ing request handling

After server.stop() downgrades the JS reference from strong to weak, GC
can collect the wrapper while uWS route handlers are still registered.
The old jsValueAssertAlive() used `.?` to unwrap the nullable value,
which is undefined behavior in release builds when null — causing a
segfault at address 0x0.

Replace jsValueAssertAlive() with tryGetJSValue() returning ?JSValue,
and handle the null case at all call sites by returning 503 Service
Unavailable (for uWS callbacks) or a rejected promise (for fetch()).

Fixes #22809

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robobun
Copy link
Collaborator Author

robobun commented Mar 18, 2026

Updated 10:35 AM PT - Mar 18th, 2026

❌ Your commit 410be9be has 2 failures in Build #40000 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 28246

That installs a local version of the PR into your bun-28246 executable, so you can run:

bun-28246 --bun

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 15e5b4fb-9bf8-4995-a8ce-fcf115645fcc

📥 Commits

Reviewing files that changed from the base of the PR and between a7e9abe and 410be9b.

📒 Files selected for processing (2)
  • src/bun.js/api/server.zig
  • test/regression/issue/22809.test.ts

Walkthrough

Modified server.zig to replace the jsValueAssertAlive function with tryGetJSValue, which returns an optional value and handles finalized servers by returning 503 Service Unavailable. Added a regression test validating server behavior after shutdown and garbage collection.

Changes

Cohort / File(s) Summary
Server API Changes
src/bun.js/api/server.zig
Replaced non-null jsValueAssertAlive getter with optional-return tryGetJSValue function. Updated all call sites to handle null cases, returning 503 Service Unavailable when the server is finalized. Affects request handling, upgrade paths, route calls, and plugin flows.
Regression Test
test/regression/issue/22809.test.ts
Added test for issue #22809 that verifies a subprocess-based scenario where a Bun server is stopped and garbage collection is triggered. Validates that subsequent requests after shutdown and GC do not cause segmentation faults, accepting either 503, 200, or connection refusal responses.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main fix: preventing a segfault when the server's JS wrapper is finalized during request handling.
Description check ✅ Passed The description covers what the PR does, test verification, and includes a fixes reference, though it could be more detailed on the 'How did you verify' template section.
Linked Issues check ✅ Passed The PR successfully addresses issue #22809 by replacing unsafe jsValueAssertAlive() with tryGetJSValue() and handling null cases gracefully at all call sites.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the segfault issue: modifications to server.zig for safer JS wrapper handling and a regression test for issue #22809.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +2143 to +2148
const server_js = this.tryGetJSValue() orelse {
// Server JS wrapper was finalized; reject with 503.
resp.writeStatus("503 Service Unavailable");
resp.endWithoutBody(true);
return;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: In onSavedRequest, the tryGetJSValue() null check (line 2143) is placed AFTER prepareJsRequestContext has already been called (line 2133), which increments pending_requests, allocates a RequestContext from the pool, creates an AbortSignal, and creates a Request object. The early 503 return at line 2147 leaks all of these resources and never decrements pending_requests, which could prevent server shutdown. Additionally, the fetch() path at line 1254 leaks a heap-allocated Request created at line 1251. Fix: move tryGetJSValue() checks before the resource allocations, matching the pattern used in onRequest (line 2103), onUserRouteRequest, and the upgrade handlers.

Extended reasoning...

Resource Leak in onSavedRequest

In onSavedRequest (line 2124), the .stack case calls prepareJsRequestContext at line 2133 to allocate request-handling resources. Looking at prepareJsRequestContext (line 2215), this function performs several allocations before returning:

  1. Increments pending_requests via onPendingRequest() at line 2252
  2. Allocates a RequestContext from the pool at line 2272
  3. Creates an AbortSignal with pendingActivityRef() at lines 2278-2280
  4. Creates a Request object at lines 2282-2288

The tryGetJSValue() null check at line 2143 happens AFTER all these allocations. When it returns null (server JS wrapper was GC-collected after stop()), the early return at line 2147 sends a 503 response but skips all cleanup that normally occurs later in the function: ctx.deinit() (line 2171), detachRequest() (line 2159), onResponse() (line 2164), and the pending_requests decrement.

Contrast with Correct Handlers

All other handlers in this PR correctly place the tryGetJSValue() check BEFORE prepareJsRequestContext. For example, onRequest at line 2103 checks tryGetJSValue() first, then calls prepareJsRequestContext at line 2110. Similarly, onUserRouteRequest (line 2055) and the upgrade handlers (line 2343) follow the same safe ordering. Only onSavedRequest has the check in the wrong position.

Step-by-Step Proof

  1. A bake DevServer triggers onSavedRequest with a .stack request
  2. prepareJsRequestContext is called at line 2133, which increments pending_requests (now N+1), allocates a RequestContext from the pool, creates an AbortSignal with a pending activity ref, and creates a Request object
  3. At line 2143, tryGetJSValue() returns null because the server JS wrapper was already GC-collected after server.stop()
  4. The function returns early at line 2147 after writing 503
  5. pending_requests is still N+1 and is never decremented back to N
  6. The RequestContext pool allocation is never returned
  7. The AbortSignal pendingActivityRef is never released
  8. request_object.request_context.detachRequest() is never called

Request Leak in fetch() Path

A secondary issue exists in the fetch() code path: Request.new(existing_request) at line 1251 heap-allocates a *Request via bun.new(). On the normal path, this is transferred to JS GC ownership via request.toJS() at line 1260. But when tryGetJSValue() returns null at line 1254, the *Request pointer is dropped without being freed. The Request struct contains allocated resources (URL string, headers, body) that are also leaked. This path is less likely to be reached in practice since fetch() is a JS prototype method requiring a live reference, but the ordering is still incorrect.

Impact and Fix

The pending_requests leak is the most impactful issue: it can prevent server.stop() from resolving because deinitIfWeCan checks that pending_requests reaches 0. The pool and signal leaks waste memory. The fix is straightforward: move the tryGetJSValue() check to before the switch statement at line 2132 (before prepareJsRequestContext is called for the .stack case). For the fetch() path, move the check to before Request.new() at line 1251. This matches the pattern used in all other correctly-ordered handlers.

Comment on lines +634 to 641
/// Returns the server's JS wrapper value, or null if it has been
/// finalized (e.g. after stop() + GC). Callers that originate from
/// uWS callbacks must handle the null case gracefully (e.g. 503).
pub fn tryGetJSValue(server: *ThisServer) ?jsc.JSValue {
return server.js_value.tryGet();
}

pub fn requestIP(this: *ThisServer, request: *jsc.WebCore.Request) bun.JSError!jsc.JSValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: onNodeHTTPRequestWithUpgradeCtx (line 1886) still uses this.js_value.tryGet() orelse .js_undefined instead of the new tryGetJSValue() + 503 pattern used by every other uWS callback in this PR. While this won't segfault, passing undefined as thisObject to the JS callback during the shutdown race produces confusing JS errors instead of a clean 503. Note that onPendingRequest() is called before this line, so a fix would also need to decrement pending_requests on the early-return path.

Extended reasoning...

What the bug is

The PR introduces tryGetJSValue() and updates all uWS callback paths to return a 503 response when the server's JS wrapper has been finalized during shutdown. However, onNodeHTTPRequestWithUpgradeCtx at line 1886 still uses the old pattern:

const thisObject: JSValue = this.js_value.tryGet() orelse .js_undefined;

instead of:

const server_js = this.tryGetJSValue() orelse {
    resp.writeStatus("503 Service Unavailable");
    resp.endWithoutBody(true);
    return;
};

Code path that triggers it

This function is called from two places: onNodeHTTPRequest (line 2018) and the upgrade handler (line 2368). Both are uWS callbacks that can fire after server.stop() + GC collects the JS wrapper. When this.js_value.tryGet() returns null, thisObject becomes .js_undefined, which is then passed as the thisObject argument to the Node HTTP request handler JS function at line 1899-1912.

Why this doesn't crash but is inconsistent

The code was already safe from segfault before this PR — it used tryGet() orelse .js_undefined rather than the dangerous .? unwrap that jsValueAssertAlive() used. The JS callback at line 1912 catches exceptions via globalThis.takeException(), so the undefined thisObject will likely produce a caught JS error. However, this is inconsistent with every other handler in the PR, which returns a clean 503 to the client.

Complication with pending_requests

Note that onPendingRequest() is called at line 1873 (incrementing this.pending_requests) before the js_value check at line 1886. Compare this with the handler at line 2378-2383 where tryGetJSValue() is checked before pending_requests += 1. If this call site were updated to return 503 early, the fix would also need to either: (a) move the onPendingRequest() call after the check, or (b) decrement pending_requests on the early-return path. This is likely why the call site was not updated — it requires slightly more restructuring than the others.

Step-by-step proof

  1. User creates a Node HTTP server with Bun.serve() using the node:http compatibility path (onNodeHTTPRequest callback).
  2. User calls server.stop(), which downgrades the JS reference to weak.
  3. GC runs and collects the JS wrapper, making this.js_value.tryGet() return null.
  4. A new HTTP request arrives, uWS dispatches to onNodeHTTPRequestonNodeHTTPRequestWithUpgradeCtx.
  5. Line 1873: pending_requests is incremented.
  6. Line 1886: this.js_value.tryGet() returns null, thisObject becomes .js_undefined.
  7. Line 1899-1912: The Node HTTP handler is called with undefined as thisObject, producing a confusing JS error instead of a 503.
  8. All other handlers in the PR would have returned 503 at step 6.

Addressing the refutation

One verifier correctly notes this function was not modified by the PR and was never using jsValueAssertAlive(). This is true — the code was already null-safe. However, the PR's stated goal is "All 8 call sites now handle the null case gracefully," and this is a 9th call site facing the identical race condition that the PR addresses everywhere else. The severity is appropriately a nit since it's pre-existing and won't crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault at address 0x00000000

1 participant