AI coding agents are fundamentally non-deterministic. When you run an agent like Claude Code, Cursor, or Hermes, standard application logs are where diagnostic context goes to die. A traditional software crash prints a stack trace, exits, and points directly to the line of failure. An AI agent, on the other hand, can fail silently: it might happily report a `Success` code while caught in an infinite tool-use loop, hallucinating an API call, or dropping critical error details.
To debug a failed agent run, you don't just want a post-facto summary. You need a flight recorder. You need to reconstruct the precise terminal state, command-line arguments, environment parameters, and stream outputs step-by-step.
Most developers attempt this by parsing logs retrospectively (passive log-reading). But passive parsing is a brittle post-mortem strategy. It fails when processes crash abruptly, log-writing buffers delay stdout, or logs contain unredacted production secrets.
To solve this, we upgraded Agent Blackbox from a retrospective gateway log parser to an active subprocess stream-tapping wrapper.
Passive Log-Reading vs. Active Stream-Tapping
Inside the Tapping Architecture: Thread-Safe Dual-Stream Redirection
The core breakthrough of the active recorder lies in its execution wrapper (`agent-doctor record`). Instead of launching an agent shell directly, you run it through the Blackbox:
agent-doctor record --name "test-agent-run" -- python run_agent.pyUnder the hood, Agent Blackbox uses Python's `subprocess.Popen` to fork the agent process, spawning concurrent background threads to tap the `stdout` and `stderr` file descriptors. This avoids blocking the main thread while maintaining sub-millisecond fidelity.
Thread-Safe In-Memory Accumulation
To capture stdout and stderr concurrently without inter-stream corruption, we spawn separate worker threads that read line-by-line using `readline` and dispatch them simultaneously to the terminal (for live user feedback) and a shared thread-safe buffer.
Here is the simplified python implementation running inside the active flight recorder:
def cmd_record(args: argparse.Namespace) -> int:
cmd_args = list(args.cmd_args or [])
if cmd_args and cmd_args[0] == "--":
cmd_args = cmd_args[1:]
if not cmd_args:
print("record requires a command to run", file=sys.stderr)
return 2
start = dt.datetime.now(dt.timezone.utc)
captured: list[str] = []
captured_lock = threading.Lock()
popen_cmd: list[str] | str = cmd_args
use_shell = False
if os.name == "nt":
popen_cmd = subprocess.list2cmdline(cmd_args)
use_shell = True
proc = subprocess.Popen(
popen_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
encoding="utf-8",
errors="replace",
bufsize=1,
shell=use_shell,
)
def stream_output(stream, terminal) -> None:
try:
for line in iter(stream.readline, ""):
terminal.write(line)
terminal.flush()
with captured_lock:
captured.append(line)
finally:
stream.close()
threads = [
threading.Thread(target=stream_output, args=(proc.stdout, sys.stdout), daemon=True),
threading.Thread(target=stream_output, args=(proc.stderr, sys.stderr), daemon=True),
]
for thread in threads:
thread.start()
exit_code = proc.wait()
for thread in threads:
thread.join()
end = dt.datetime.now(dt.timezone.utc)
duration_seconds = (end - start).total_seconds()
raw_output = "".join(captured)
sanitized_output = sanitize(raw_output)
run_id = f"{start.strftime('%Y%m%dT%H%M%S')}_{uuid.uuid4().hex[:8]}"
out_dir = Path(args.out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
out_path = out_dir / f"run_{run_id}.json"
record = {
"id": run_id,
"name": args.name or " ".join(cmd_args),
"command": cmd_args,
"start_time": start.isoformat(),
"end_time": end.isoformat(),
"duration_seconds": duration_seconds,
"exit_code": exit_code,
"raw_output": raw_output,
"sanitized_output": sanitized_output,
}
out_path.write_text(json.dumps(record, indent=2), encoding="utf-8")
print(f"[FLIGHT RECORDER] Command finished with exit code {exit_code} in {duration_seconds:.2f}s.")
print(f"[FLIGHT RECORDER] Recording saved to: {out_path}")
return exit_codeIn-Flight Credential Sanitization
Observability cannot come at the expense of security. Because agents regularly manipulate env vars, interact with LLM providers, and output bearer tokens, our stream tapping wrapper intercepts these strings in-memory before they ever hit the disk.
We run a dedicated sanitization pipeline applying regex filters to redact sensitive patterns:
- API Keys / Secrets: Filters matching `(?i)(api[_-]?key|token|secret|password|authorization)[:=]\s*[^\s,;]+`
- Bearer Credentials: Neutralizing HTTP Auth headers matching `(?i)bearer\s+[a-z0-9._\-]+`
- Private Cryptographic Keys: Catching blocks matching `-----BEGIN [A-Z ]*PRIVATE KEY-----.*?-----END [A-Z ]*PRIVATE KEY-----`
The stream tapping pipeline generates two clean outputs:
- Live Terminal View: Unaltered output so the local developer sees identical execution paths.
- JSON Flight Record: A fully-sanitized, shareable snapshot with sensitive elements replaced by `[REDACTED_SECRET]`.
The Structured Flight Record JSON Ledger
Upon completion, the recorder automatically bundles metadata and outputs into a portable JSON ledger. This JSON acts as a deterministic timeline for downstream diagnostic engines, enabling instant post-mortems.
{
"id": "20260621T182415_a7f92b1d",
"name": "agent-flight-deployment",
"command": ["python", "deploy_agent.py", "--prod"],
"start_time": "2026-06-21T18:24:15.102482+00:00",
"end_time": "2026-06-21T18:24:22.459102+00:00",
"duration_seconds": 7.35662,
"exit_code": 1,
"raw_output": "...",
"sanitized_output": "[FLIGHT INFO] Deploying agent...\n[ERROR] Connection failed: authorization=Bearer [REDACTED_SECRET]"
}The Automation Architectâs Verdict
To run high-velocity, human-out-of-the-loop automation, you cannot fly blind. If you do not have a dedicated, local-first flight recorder, you cannot debug flaky agent logic without wasting thousands of tokens re-running broken states.
Active subprocess stream-tapping with real-time sanitization changes the equation. It makes non-deterministic agent executions deterministic, secure, and inspectable.

