-
Notifications
You must be signed in to change notification settings - Fork 25
Connection._receive_loop EOF does not reject pending requests, causing infinite hang #85
Description
Summary
When the remote ACP subprocess crashes or closes its stdout, Connection._receive_loop exits cleanly on EOF without rejecting pending outgoing requests. Any in-flight send_request() futures hang forever.
This means a subprocess crash during initialize(), new_session(), or prompt() is silently converted into an infinite hang instead of raising an error.
Reproduction
- Spawn an ACP subprocess that crashes immediately on startup (e.g.,
claude-agent-acpon Node.js < 20 which crashes withSyntaxError) - Call
conn.initialize()— it sends the JSON-RPC request and awaits the response future - The subprocess exits with code 1, stdout closes
_filter_jsonrpc_linesreads EOF → feeds EOF tofiltered_reader_receive_loopreads empty line → breaks normally (no exception)TaskSupervisor._on_donecallstask.result()→ returnsNone(clean exit)_on_receive_erroris never called (only fires on exceptions)_state.reject_all_outgoing()is never called- The
initialize()future hangs forever
Root Cause
In connection.py:
async def _receive_loop(self) -> None:
try:
while True:
line = await self._reader.readline()
if not line:
break # EOF — exits cleanly, no exception raised
...
except asyncio.CancelledError:
returnThe _on_receive_error callback is registered via TaskSupervisor.create(..., on_error=self._on_receive_error), but _on_done only calls on_error when task.result() raises an exception. A clean EOF exit does not raise, so reject_all_outgoing is never invoked.
Suggested Fix
Reject all pending requests when the receive loop exits on EOF:
async def _receive_loop(self) -> None:
try:
while True:
line = await self._reader.readline()
if not line:
break
...
except asyncio.CancelledError:
return
# EOF: remote end closed. Reject any in-flight requests so callers
# get an exception instead of hanging forever.
self._state.reject_all_outgoing(
ConnectionError("Connection closed: remote end sent EOF")
)Impact
We discovered this while debugging why ACP evals (SWE-bench Multimodal) hang indefinitely for certain repos. The ACP subprocess crashed on startup due to incompatible Node.js versions, but instead of getting an error, the SDK hung forever at conn.initialize(). This affected ~65% of eval instances.
Environment
agent-client-protocolversion: 0.8.1- Python: 3.12/3.13
- OS: Linux (K8s pods)