fix: stop clearing MILADY_API_TOKEN/ELIZA_API_TOKEN in Docker provider#423
fix: stop clearing MILADY_API_TOKEN/ELIZA_API_TOKEN in Docker provider#423
Conversation
…port (31337)
- Change -p ${bridgePort}:DEFAULT_BRIDGE_PORT (→31337) to map to MILADY_PORT (2138)
- Both bridgePort and webUiPort now map to the same internal port 2138,
which is the only port the milady agent actually listens on.
- Also set MILADY_API_BIND=0.0.0.0 alongside ELIZA_API_BIND so the agent
binds to all interfaces (not just loopback) — required for Docker port
mapping to work from outside the container.
- DEFAULT_BRIDGE_PORT (31337) constant remains for env-var compat but is
no longer used as the docker container-side port.
fix: bridge port maps to wrong internal port (31337 → 2138)
- vercel.json: remove container-lifecycle crons (process-provisioning-jobs, health-check, deployment-monitor) — these are now exclusively owned by milady-provisioning-worker on the VPS - deploy-backend.yml: add restart of milady-provisioning-worker after eliza-cloud restart so worker picks up new code on each deploy - INFRASTRUCTURE.md: document Vercel vs VPS ownership, docker nodes, Neon DB, Redis, and missing GitHub Actions secrets VPS .env.local updated separately (out of band): - MILADY_DOCKER_IMAGE bumped to v2.0.0-steward-8 - ELIZA_CLOUD_AGENT_BASE_DOMAIN changed from waifu.fun to milady.ai - Added: MILADY_SANDBOX_PROVIDER, MILADY_BRIDGE_INTERNAL_PORT, STEWARD_CONTAINER_URL, REDIS_URL, KV_URL, MILADY_SSH_KEY (base64)
…uting - Add WalletProvider interface (packages/lib/services/wallet-provider.ts) - Add StewardClient singleton wrapper (packages/lib/services/steward-client.ts) - Add wallet migration feature flags (packages/lib/config/wallet-provider-flags.ts) - Update agent_server_wallets schema: wallet_provider, steward_agent_id, steward_tenant_id columns - Make privy_wallet_id nullable for Steward-only wallets - Add DB migration 0058 with CHECK constraint ensuring exactly one provider ID - Update server-wallets.ts: dual-provider routing for provisionServerWallet() and executeServerWalletRpc() - New wallets routed to Steward when USE_STEWARD_FOR_NEW_WALLETS=true - Existing Privy wallets continue working unchanged - Install @stwd/sdk@0.3.0 Rollback: set USE_STEWARD_FOR_NEW_WALLETS=false to revert to Privy for all new wallets.
- Add steward-client.ts: lightweight client for querying Steward agent/wallet info - Add GET /api/v1/milady/agents/[agentId]/wallet endpoint for detailed wallet info - Update agent detail API to include walletAddress, walletProvider, walletStatus - Update compat API (toCompatAgent) to include wallet_address, wallet_provider - Update admin docker-containers route to show wallet provider and address - Update managed env to inject STEWARD_API_URL and STEWARD_AGENT_ID - Pass sandboxId through to prepareManagedMiladyEnvironment Docker-backed agents query Steward for wallet data; Privy agents fall back to DB. All Steward calls are best-effort with timeouts — API degrades gracefully.
The check-types-split.ts script was scanning 'lib/', 'db/', and 'components/' — none of which exist at the project root. The actual source lives at packages/lib, packages/db, and packages/ui/src/components. This meant packages/lib (including feature-gate.ts, use-feature-flags.ts, and feature-flags.ts) was never actually type-checked in CI. The split type-checker silently skipped all those files. Fix: update getDirectoriesToCheck() to use the correct package paths.
…ess fallback - CloudFormation: default-deny direct container port; opt-in via DirectContainerPortCidr param - milady-web-ui: getClientSafeMiladyAgentWebUiUrl returns null instead of falling back to headscale direct access - Dashboard pages: remove webUiUrl gating on connect button (always use pairing flow for running agents) - agent-actions/sandboxes-table: drop getConnectUrl and webUiUrl prop threading - Add cloudformation-template unit test - Gitignore env backup files
…e-slop UI Infrastructure: - Add wallet proxy route (/api/v1/milady/agents/[id]/api/wallet/[...path]) Proxies wallet/steward requests to agent's REST API with proper auth - Switch Neon provisioning from projects to branches (fixes 100-project limit) New agents get branches within shared parent project - Add cleanup-stuck-provisioning cron (resets agents stuck >10min) - Remove process-provisioning-jobs Vercel cron (VPS worker handles this) - Add milady.ai to redirect allowlists for Stripe checkout Dashboard UI: - Agent detail: add Wallet, Transactions, Policies tabs - Billing: replace credit pack cards with custom amount + card/crypto - Agent cards: deterministic character images instead of identical fallbacks - De-slop text across all dashboard pages - Create dialog: cleaner copy, Deploy button - Pricing: tighter descriptions
Replace the broken custom billing page with the working BillingTab component from Settings. Same Stripe + crypto flow, invoices, and balance display.
…ution
Root cause: bun test --max-concurrency=1 runs all test files in a single
process. When a test file calls mock.module("@/db/repositories", ...) with
only a partial set of exports, the mock persists and breaks all subsequent
test files that import named exports not included in the mock.
Fixes:
1. privy-sync: change @/db/repositories mock to use specific sub-module
paths (organization-invites, users) so the full repositories index is
never replaced globally.
2. admin-service-pricing route+test: import servicePricingRepository from
its specific sub-module (service-pricing) instead of the full index, and
update the test mock accordingly.
3. Add missing InsufficientCreditsError export to @/lib/services/credits
mocks in four test files that omitted it, preventing mcp-tools.test.ts
from failing when it transitively imports app/api/mcp/tools/memory.ts.
…ry whitelist, typed token access, no hardcoded fallback - Wallet proxy: whitelist allowed wallet sub-paths (prevents path traversal) - Wallet proxy: whitelist allowed query params (limit, offset, cursor, type, status) - Wallet proxy: typed environment_vars access instead of unsafe casts - Wallet proxy: POST body size limit (1MB) + Content-Type validation - Wallet proxy: reject multi-segment paths - Neon: remove hardcoded project ID fallback, warn if env var missing
…async job queue The VPS worker handles SSH/Docker operations. Sync provisioning in Vercel serverless functions can't do SSH and times out. Force async path so the VPS worker is always the one deploying containers.
The VPS worker writes bridge_url and status to the primary DB. The wallet proxy reads with findRunningSandbox which was using the read replica (dbRead). Replica lag caused 503 'not running' errors. Switched to dbWrite (primary) for consistent reads.
…internal IPs
Vercel serverless functions can't reach Hetzner internal Docker IPs.
Route wallet proxy through the agent's public domain ({agentId}.waifu.fun)
which is accessible from anywhere via nginx/cloudflare routing.
The cron was still processing jobs despite being removed from vercel.json. Replace the route with a no-op that returns immediately. Provisioning is handled exclusively by the standalone VPS worker.
Neon plan limits (100 projects, 10 branches/project) block new agent provisioning with BRANCHES_LIMIT_EXCEEDED. ElizaOS plugin-sql tables already scope data by agent UUID, so all agents can safely share one DB. Changes: - provisionNeon() now returns process.env.DATABASE_URL instead of calling neon.createProject() - cleanupNeon() accepts null/undefined and no-ops in shared-DB mode - Delete path already guarded by if (rec.neon_project_id) check - neon-client.ts preserved for future use / legacy project cleanup - Existing neon_project_id/neon_branch_id columns unchanged in schema
…nto fix/consolidated-cloud-fixes
…ard wallets + dashboard deslop + shared DB
…ning, concurrency limits, correctness fixes - Delete INFRASTRUCTURE.md (VPS IPs in public repo) - Delete TRIAGE_NOTES.md, CLAUDE.md (stray merge artifacts) - Delete duplicate lib/milady-web-ui.ts + its test (packages/lib version is canonical) - Add GET handler to process-provisioning-jobs stub (Vercel cron uses GET) - Fix wallet_provider inference: null when not fetched, not inferred from node_id - Remove hardcoded Neon project ID fallback - Add STEWARD_TENANT_API_KEY missing warning in docker-sandbox-provider and steward-client - Add concurrency limiter (max 5) for Steward enrichment in admin containers endpoint - Remove unused sanitizeProjectNameSegment function
…38) for v2.0.4 agent
The DB-generated API token from managed-milady-env.ts is the canonical inbound auth credential. Clearing it to empty caused cloud containers to start without a token, which (combined with the agent-side auth gate in v2.0.4+) resulted in 401 on every request. The token now flows through to the container so the pair flow can hand browsers the correct key via the nginx cookie auth bridge.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code ReviewOverall this is a well-structured PR with clear motivation. The core fix (removing the two empty-string overrides) is correct and the Steward wallet integration is architecturally sound. A few issues worth addressing before merge: 🔴 Critical
This means all agent containers receive the cloud platform's connection string. If an ElizaOS plugin has a bug, a misconfigured agent writes to the wrong tables, or a container is compromised, it has access to the shared cloud DB — including other tenants' data, the jobs table, billing records, etc. The comment says "ElizaOS plugin-sql tables scope all data by agent UUID", but that relies on every plugin being correct and well-behaved. A single Recommendation: Strongly consider using Neon branches off 🟡 High
The file was deleted entirely. It contained important developer guidance: migration rules (no
This is a new package (
🟠 Medium
// packages/lib/services/steward-client.ts
if (!response.ok) {
if (response.status === 404) return null;
logger.warn(`[steward-client] ${path} returned ${response.status}`);
return null; // 401, 403, 500 all silently return null
}A misconfigured or rotated Both `-p ${bridgePort}:${allEnv.PORT || DEFAULT_AGENT_PORT}`,
`-p ${webUiPort}:${allEnv.PORT || DEFAULT_AGENT_PORT}`,Two separate host ports are allocated for the same container port
// packages/lib/milady-web-ui.ts
- return getMiladyAgentDirectWebUiUrl(sandbox, options);
+ return null;The previous fallback returned the direct URL when no canonical URL was set. Any caller that used this for display (e.g. showing users a link to their agent's web UI) will now silently get nothing. Module-level Both 🟢 Low / NitsQuery parameter values not validated in wallet proxy
// provision/route.ts
const sync = false;The variable is assigned but the comment says the sync path is for local dev only. If Double blank line at end of +
+// Steward wallet migration flags live in wallet-provider-flags.tsThere's a spurious blank line before the comment. Journal index gap The ✅ What looks good
|
PR #421 fixed milady-sandbox.ts to use the shared DATABASE_URL but missed user-database.ts which still called neonClient.createProject(). The ai-app-builder calls this path, hitting BRANCHES_LIMIT_EXCEEDED. Same pattern: use process.env.DATABASE_URL, encrypt before storing, mark as ready. Legacy per-app projects with existing project IDs still get cleaned up correctly via cleanupDatabase().
Code ReviewThis PR introduces Phase 1 of a Privy → Steward wallet provider migration plus several fixes. The core Docker token fix is correct and the DB migration is safe. A few issues worth addressing before merge: Bugs / CorrectnessHard-coded
const parsed = JSON.parse(typedData);
Missing Steward 409 handling in Security
Missing startup guard for missing ReliabilityNo timeout on Steward SDK calls in
MinorCLAUDE.md deletion — This PR removes
Test Coverage Gaps
The core fix (token flow-through in |
…hrow
The validateRateLimitConfig() function set hasValidatedConfig=true BEFORE
throwing its error. This meant:
- 1st request to /api/auth/pair → flag set → throw → HTTP 500 ("Pairing failed")
- 2nd request → flag already set → early return → HTTP 200 ✓
Root cause confirmed via nginx access.log:
POST /api/auth/pair → 500 (1st try)
POST /api/auth/pair → 200 (2nd try, 4 seconds later)
And journalctl showing the exact error text thrown:
'🚨 SECURITY: Redis rate limiting is required in production...'
Fix: convert the throw to a logger.warn so in-memory rate limiting is allowed
on single-server VPS deployments. Multi-instance deployments should still
configure Redis (REDIS_RATE_LIMITING=true) to share rate-limit state across
processes, but the single-server case is safe and should never 500.
Code Review — PR #423Overall the fix to stop clearing 🔴 CriticalShared If the goal is to avoid Neon project limits, the safer path is per-agent Neon branches on the shared project (the 🟡 Important
Migration journal gap Rate-limiting guard downgraded from 🟠 Minor
These should use Both host ports mapped to the same container port Two different host ports bound to the same container port is valid and intentional per the comment, but it means Noop change in compat agents list route // before
agents.map(toCompatAgent)
// after
agents.map((a) => toCompatAgent(a))These are functionally identical. If this was meant to pass
✅ Good
|
PR ReviewThis PR bundles a targeted auth fix with a substantial Steward wallet integration (Phase 1). The core fix is correct and well-explained. Several issues in the surrounding changes worth addressing: Bugs / CorrectnessBoth host ports map to the same container port Two external ports publishing to the same container port is valid Docker but means nginx must route by host header or path — if it routes by port number, one of these will always be wrong. The comment says "nginx can reach /api/* via bridge_url and the UI via web_ui_port" but both now point at the same listener. Worth confirming the nginx config handles this.
const sync = false;The comment says "The sync path remains in code for local dev only" but there's no way to enable it without editing source. If local dev still uses the sync path, this breaks it. A SecurityWallet proxy path validation gap in route handler Shared Code QualityModule-level if (!STEWARD_TENANT_API_KEY) {
console.warn("[steward-client] STEWARD_TENANT_API_KEY is not set...");
}This fires every time the module is imported (e.g. in tests, cold starts). Use Redundant
- agents.map(toCompatAgent)
+ agents.map((a) => toCompatAgent(a))These are equivalent; the lambda wrapper adds overhead with no benefit.
Minor / Nits
What's Good
|
Problem
docker-sandbox-provider.tsclearedMILADY_API_TOKENandELIZA_API_TOKENto empty strings before passing env vars to the container. The DB-generated token (frommanaged-milady-env.ts) was discarded.With agent v2.0.4+ (which tightened auth in milady-ai/milady#1434), cloud containers with no token reject all requests with 401.
The pair flow also broke:
/api/auth/pairreturns the DB token to the browser, but the container didn't have that token, so the browser's auth header didn't match anything.Fix
Removed the two lines that override
MILADY_API_TOKENandELIZA_API_TOKENto empty strings. The DB-generated token now flows through to the container so:isAuthorized()has a real token to match againstTesting
packages/tests/unit/auth-pair-route.test.ts— 7 tests passpackages/tests/unit/milaidy-pairing-token-route.test.ts— 4 tests passCompanion PR: milady-ai/milady#1551 (agent-side safety net)