Agent Performance Report — Week of 2026-03-17 #21431
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-18T17:52:51.833Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Analysis period: 2026-03-10 → 2026-03-17. The ecosystem logged 100+ workflow runs across all agents, of which ~20 were meaningful agentic executions (the rest are lockdown fast-fails on push events or slash-command stubs). The headline story this week is AI Moderator's full recovery and the ongoing P0 infrastructure outage (GH_AW_GITHUB_TOKEN missing) still blocking Issue Monster and related agents at 100% failure.
Overall scores: Quality 81/100 (↓2), Effectiveness 80/100 (↓2), Ecosystem Health 68/100 (→ stable)
Performance Rankings
Top Performing Agents 🏆
1. Daily Safe Outputs Conformance Checker (Quality: 90/100, Effectiveness: 95/100)
claude— clean firewall profile (api.anthropic.com only)2. AI Moderator (Quality: 88/100, Effectiveness: 90/100)
codexfor full runs; responds consistently across trigger types3. Workflow Health Manager (Quality: 85/100, Effectiveness: 82/100)
4. Lockfile Statistics Analysis Agent (Quality: 85/100, Effectiveness: 80/100)
claude— systematic and thoroughAgents Needing Attention 📉
1. Issue Monster (Quality: N/A, Effectiveness: 0/100) ❌ P0
GH_AW_GITHUB_TOKENnot available in runtime environment2. Daily Safe Output Integrator (Quality: 72/100, Effectiveness: 78/100)⚠️
copilot3. Contribution Check (Quality: 75/100, Effectiveness: 74/100)⚠️ Recovering
pr-filter-results.json(pre-agent output file) absent in run §23179794131lgtm/needs-worklabels) — monitornoopif missing rather than silently failing.Behavioral Patterns
Productive Patterns ✅
shared-alerts.md, and created tracking issue — clean cross-orchestrator handoffProblematic Patterns⚠️
precomputestep — separate from the token issue but also unresolved. See shared-alerts.md P1..lock.ymlfiles (down from 18 last week — slow progress). Agents cannot run on updated configs until recompiled.Coverage Analysis
Coverage Map
Well-covered:
Coverage gaps:
Engine diversity:
copilot: Contribution Check, Daily Safe Output Integrator, Workflow Health Manager, The Great Escapiclaude: Daily Safe Outputs Conformance Checker, Semantic Function Refactoring, Lockfile Statistics Analysis Agentcodex: AI Moderatorclaudeagents trend toward lower token usage + higher turn efficiency.Recommendations
High Priority
Unblock Issue Monster (P0) — Provision
GH_AW_GITHUB_TOKENin the workflow environment. Until then, add a graceful degradation path that callsnooprather than erroring on every run. Track in #20315.Investigate Daily Safe Output Integrator token spike — 3.1M tokens in a single run is 4-5× expected. Audit the prompt for unconstrained context growth, potentially add
max_tokensguard or context windowing.Fix Bot Detection precompute failure (P1) — 84% failure rate since Mar 15. Already tracked by Workflow Health Manager — needs developer attention on the precompute step.
Medium Priority
Contribution Check: guard missing pre-agent file — Add explicit check for
pr-filter-results.jsonin the orchestrator; emitmissing_dataornooprather than proceeding with empty filter context.Address 16 stale lock files — Run
make recompileto refresh.lock.ymlfiles. Currently: ai-moderator, ci-doctor, copilot-agent-analysis, copilot-pr-nlp-analysis, daily-architecture-diagram, daily-code-metrics, deep-report, docs-noob-tester, firewall-escape, pr-nitpick-reviewer, repository-quality-improver, scout, smoke-claude, smoke-gemini, sub-issue-closer, test-project-url-default.Security Review Agent lockdown investigation — This agent runs but fast-fails in 2s on every push trigger. Verify it's correctly gated or if it should be running on push events.
Low Priority
Contribution Check token variance — Investigate why token usage ranges from 228K to 2.9M. Set a soft cap and log when approaching it.
Metrics Collector improvement — The metrics collector still lacks runtime data (GitHub API unavailable in its environment). Consider configuring GitHub MCP server for the metrics collection job.
Trends
The slight quality/effectiveness decline reflects the ongoing P0 impact and Bot Detection escalation, partially offset by AI Moderator's sustained recovery and Contribution Check's improving trend.
Actions Taken This Run
agent-performance-latest.mdin shared repo memoryshared-alerts.mdwith current P0/P1 coordination notesReferences:
Beta Was this translation helpful? Give feedback.
All reactions