Agent Performance Report — Week of 2026-03-17 #21431

2026-03-17T17:52:52Z

github-actions[bot]
bot Mar 17, 2026

Overview

Analysis period: 2026-03-10 → 2026-03-17. The ecosystem logged 100+ workflow runs across all agents, of which ~20 were meaningful agentic executions (the rest are lockdown fast-fails on push events or slash-command stubs). The headline story this week is AI Moderator's full recovery and the ongoing P0 infrastructure outage (GH_AW_GITHUB_TOKEN missing) still blocking Issue Monster and related agents at 100% failure.

Overall scores: Quality 81/100 (↓2), Effectiveness 80/100 (↓2), Ecosystem Health 68/100 (→ stable)

Performance Rankings

Top Performing Agents 🏆

1. Daily Safe Outputs Conformance Checker (Quality: 90/100, Effectiveness: 95/100)

Perfect 6/6 scheduled run success rate this week
Highly efficient: 220K tokens, ~7 min average runtime (fastest of the analysis agents)
Engine: claude — clean firewall profile (api.anthropic.com only)
Produced 7 safe output items (issues/noops) accurately scoped to violations found
Example run: §23206571446

2. AI Moderator (Quality: 88/100, Effectiveness: 90/100)

FULLY RECOVERED — 5/5 successful runs (4× issue_comment triggers + 1× issue trigger)
Runtime: 47s–10.6m depending on context; no errors or missing tools
Engine: codex for full runs; responds consistently across trigger types
Recovery confirmed across two consecutive weekly assessments — this pattern holds ✅

3. Workflow Health Manager (Quality: 85/100, Effectiveness: 82/100)

3/4 scheduled runs succeeded; 1 run had a recoverable error but still produced 3 safe items
7 total safe items (issues + comments) — good output density
2.25M tokens / 37 turns in 7 days — appropriately sized for a meta-orchestrator
Correctly escalated Bot Detection from P2 → P1; updated shared-alerts.md ✅
Example run: §23183314863

4. Lockfile Statistics Analysis Agent (Quality: 85/100, Effectiveness: 80/100)

1/1 success, 10.2 min, 18 turns, 787K tokens
Engine: claude — systematic and thorough
Example run: §23207104501+related

Agents Needing Attention 📉

1. Issue Monster (Quality: N/A, Effectiveness: 0/100) ❌ P0

50/50 failures (7-day). 46 hard errors. 0 tokens. ~25s per run (all pre-activation failures)
Root cause: GH_AW_GITHUB_TOKEN not available in runtime environment
Tracked: #20315 — no change since March 15
Impact: Zero triage work completed on the issue backlog. This continues to compound debt.
Recommendation: Escalate token provisioning to human maintainers. Consider a "degraded mode" that skips GitHub App token dependency when token is unavailable.

2. Daily Safe Output Integrator (Quality: 72/100, Effectiveness: 78/100) ⚠️

1 run this week: success, 11.7 min, but 3.12M tokens — highest token consumption of any agent this period
Token spike is abnormal; previous week's baseline for similar agents: ~500K–800K
No turns data captured, suggesting the agent may be using a different collection path
Engine: copilot
Recommendation: Audit prompt for unbounded context expansion. Consider adding a token budget guard or splitting into smaller scoped runs.

3. Contribution Check (Quality: 75/100, Effectiveness: 74/100) ⚠️ Recovering

7-day: ~8/10 scheduled runs succeeded; 2 failures with errors
Token usage wildly variable: 228K to 2.9M per run (this run: 2.9M)
1 missing data incident: pr-filter-results.json (pre-agent output file) absent in run §23179794131
Tool usage healthy: 36 safe items across 30 runs (good output density)
Recovering from label issue (applying non-existent lgtm/needs-work labels) — monitor
Recommendation: Add guard on pre-agent output file existence; log warning + noop if missing rather than silently failing.

Behavioral Patterns

Productive Patterns ✅

Workflow Health Manager → Alert system: Successfully escalated Bot Detection from P2 → P1, updated shared-alerts.md, and created tracking issue — clean cross-orchestrator handoff
Daily Safe Outputs Conformance Checker: Consistently runs fast, scopes outputs precisely to violations, no noise — a model for other daily checkers
AI Moderator: Handling multiple event types (issues + comments) efficiently, appropriate noop behavior when no action needed
Push-triggered lockdown pattern: 80+ runs that fast-fail in <3s on push events — correct behavior, minimal resource waste

Problematic Patterns ⚠️

Token explosion (Contribution Check + Daily Safe Output Integrator): Two agents with runs exceeding 2–3M tokens without clear justification. Pattern suggests unbounded context accumulation.
Persistent P0 paralysis: Issue Monster has been at 100% failure for 72+ hours with no resolution path visible. The agent runs ~8× per day (every 3h), consuming workflow quota with zero output.
Bot Detection regression: 84% failure rate (11/13 since Mar 15) in precompute step — separate from the token issue but also unresolved. See shared-alerts.md P1.
Stale lock files: 16 workflows with stale .lock.yml files (down from 18 last week — slow progress). Agents cannot run on updated configs until recompiled.

Coverage Analysis

Coverage Map

Well-covered:

PR quality gates (Contribution Check, PR Nitpick)
Safe output compliance (Daily Safe Outputs Conformance Checker)
Workflow health monitoring (Workflow Health Manager)
Code moderation (AI Moderator)
Issue management (Auto-Triage Issues — working; Issue Monster — blocked)

Coverage gaps:

Issue backlog triage: Issue Monster P0 outage creates a ~72h+ triage gap
Performance regression detection: no dedicated agent observing runtime metrics
Security scanning: Security Review Agent runs but fast-fails (lockdown pattern — needs investigation)

Engine diversity:

copilot: Contribution Check, Daily Safe Output Integrator, Workflow Health Manager, The Great Escapi
claude: Daily Safe Outputs Conformance Checker, Semantic Function Refactoring, Lockfile Statistics Analysis Agent
codex: AI Moderator
Healthy diversity. claude agents trend toward lower token usage + higher turn efficiency.

Recommendations

High Priority

Unblock Issue Monster (P0) — Provision GH_AW_GITHUB_TOKEN in the workflow environment. Until then, add a graceful degradation path that calls noop rather than erroring on every run. Track in #20315.
Investigate Daily Safe Output Integrator token spike — 3.1M tokens in a single run is 4-5× expected. Audit the prompt for unconstrained context growth, potentially add max_tokens guard or context windowing.
Fix Bot Detection precompute failure (P1) — 84% failure rate since Mar 15. Already tracked by Workflow Health Manager — needs developer attention on the precompute step.

Medium Priority

Contribution Check: guard missing pre-agent file — Add explicit check for pr-filter-results.json in the orchestrator; emit missing_data or noop rather than proceeding with empty filter context.
Address 16 stale lock files — Run make recompile to refresh .lock.yml files. Currently: ai-moderator, ci-doctor, copilot-agent-analysis, copilot-pr-nlp-analysis, daily-architecture-diagram, daily-code-metrics, deep-report, docs-noob-tester, firewall-escape, pr-nitpick-reviewer, repository-quality-improver, scout, smoke-claude, smoke-gemini, sub-issue-closer, test-project-url-default.
Security Review Agent lockdown investigation — This agent runs but fast-fails in 2s on every push trigger. Verify it's correctly gated or if it should be running on push events.

Low Priority

Contribution Check token variance — Investigate why token usage ranges from 228K to 2.9M. Set a soft cap and log when approaching it.
Metrics Collector improvement — The metrics collector still lacks runtime data (GitHub API unavailable in its environment). Consider configuring GitHub MCP server for the metrics collection job.

Trends

Metric	This Week	Last Week	Δ
Overall Quality	81/100	83/100	↓2
Effectiveness	80/100	82/100	↓2
Ecosystem Health	68/100	70/100	↓2
AI Moderator	✅ Recovered	✅ Recovered	→
Issue Monster success rate	0%	0%	→ P0
Contribution Check schedule success	~80%	~65%	↑
Stale lock files	16	18	↓2

The slight quality/effectiveness decline reflects the ongoing P0 impact and Bot Detection escalation, partially offset by AI Moderator's sustained recovery and Contribution Check's improving trend.

Actions Taken This Run

Generated this performance report discussion
Updated agent-performance-latest.md in shared repo memory
Updated shared-alerts.md with current P0/P1 coordination notes
No new improvement issues created (existing issues cover active P0/P1 items)

Analysis period: 2026-03-10 to 2026-03-17 | Run: §23207992326
Next report: 2026-03-24

References:

§23207992326 — This run (Agent Performance Analyzer)
§23183314863 — Workflow Health Manager (latest)
§23206571446 — Daily Safe Outputs Conformance Checker (latest)

AI generated by Agent Performance Analyzer - Meta-Orchestrator · history

expires on Mar 18, 2026, 5:52 PM UTC

2026-03-18T18:58:38Z

github-actions[bot]
bot Mar 18, 2026
Author

This discussion was automatically closed because it expired on 2026-03-18T17:52:51.833Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of 2026-03-17 #21431

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of 2026-03-17 #21431

Uh oh!

github-actions[bot] bot Mar 17, 2026

Overview

Performance Rankings

Top Performing Agents 🏆

Agents Needing Attention 📉

Behavioral Patterns

Coverage Analysis

Recommendations

High Priority

Medium Priority

Low Priority

Trends

Actions Taken This Run

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 18, 2026 Author

github-actions[bot]
bot Mar 17, 2026

github-actions[bot]
bot Mar 18, 2026
Author