Skip to content

Connection._receive_loop EOF does not reject pending requests, causing infinite hang #85

@simonrosenberg

Description

@simonrosenberg

Summary

When the remote ACP subprocess crashes or closes its stdout, Connection._receive_loop exits cleanly on EOF without rejecting pending outgoing requests. Any in-flight send_request() futures hang forever.

This means a subprocess crash during initialize(), new_session(), or prompt() is silently converted into an infinite hang instead of raising an error.

Reproduction

  1. Spawn an ACP subprocess that crashes immediately on startup (e.g., claude-agent-acp on Node.js < 20 which crashes with SyntaxError)
  2. Call conn.initialize() — it sends the JSON-RPC request and awaits the response future
  3. The subprocess exits with code 1, stdout closes
  4. _filter_jsonrpc_lines reads EOF → feeds EOF to filtered_reader
  5. _receive_loop reads empty line → breaks normally (no exception)
  6. TaskSupervisor._on_done calls task.result() → returns None (clean exit)
  7. _on_receive_error is never called (only fires on exceptions)
  8. _state.reject_all_outgoing() is never called
  9. The initialize() future hangs forever

Root Cause

In connection.py:

async def _receive_loop(self) -> None:
    try:
        while True:
            line = await self._reader.readline()
            if not line:
                break  # EOF — exits cleanly, no exception raised
            ...
    except asyncio.CancelledError:
        return

The _on_receive_error callback is registered via TaskSupervisor.create(..., on_error=self._on_receive_error), but _on_done only calls on_error when task.result() raises an exception. A clean EOF exit does not raise, so reject_all_outgoing is never invoked.

Suggested Fix

Reject all pending requests when the receive loop exits on EOF:

async def _receive_loop(self) -> None:
    try:
        while True:
            line = await self._reader.readline()
            if not line:
                break
            ...
    except asyncio.CancelledError:
        return
    # EOF: remote end closed. Reject any in-flight requests so callers
    # get an exception instead of hanging forever.
    self._state.reject_all_outgoing(
        ConnectionError("Connection closed: remote end sent EOF")
    )

Impact

We discovered this while debugging why ACP evals (SWE-bench Multimodal) hang indefinitely for certain repos. The ACP subprocess crashed on startup due to incompatible Node.js versions, but instead of getting an error, the SDK hung forever at conn.initialize(). This affected ~65% of eval instances.

Environment

  • agent-client-protocol version: 0.8.1
  • Python: 3.12/3.13
  • OS: Linux (K8s pods)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions