Skip to content

fix(server-identity): handle onboarding persistence mismatches#1974

Merged
elibosley merged 5 commits intomainfrom
codex/investigate-onboarding-ident-persistence
Mar 30, 2026
Merged

fix(server-identity): handle onboarding persistence mismatches#1974
elibosley merged 5 commits intomainfrom
codex/investigate-onboarding-ident-persistence

Conversation

@Ajit-Mehrotra
Copy link
Copy Markdown
Contributor

@Ajit-Mehrotra Ajit-Mehrotra commented Mar 27, 2026

Summary

This investigation started from the onboarding bug where server name and server description appeared to update in the UI but were not reliably surviving reboot.

This PR does four things:

  • removes the dead onboarding-only identity update stub from the old onboarding service path
  • makes ServerService.updateServerIdentity() send the same identity payload fields as the webgui Identification form
  • preserves the underlying emcmd error details in GraphQL error extensions
  • only returns a server identity failure after checking /boot/config/ident.cfg and confirming the requested NAME / COMMENT / SYS_MODEL values did not persist

What Changed

Identity update payload

The API identity mutation now mirrors the webgui Identification form payload more closely by always sending:

  • changeNames=Apply
  • server_https
  • server_name
  • server_addr
  • NAME
  • COMMENT
  • SYS_MODEL

This closes the gap between the original API implementation and the working webgui flow.

Persistence-first success check

updateServerIdentity() no longer trusts the emcmd response alone.

Instead it:

  • attempts the emcmd update
  • reads ident.cfg
  • compares the persisted NAME, COMMENT, and SYS_MODEL values to the requested update
  • returns success if the file contains the requested values, even if emcmd reported an error on the response path
  • throws only when ident.cfg is still stale, or when the file cannot be read to verify the result

Error detail propagation

When the mutation does fail, the GraphQL error still includes the original low-level message in extensions.cause, which made the transport debugging possible.

Tests

Ran:

  • pnpm --filter ./api exec vitest run src/core/utils/clients/emcmd.spec.ts src/unraid-api/graph/resolvers/servers/server.service.spec.ts
  • pnpm --filter ./api exec vitest run src/unraid-api/graph/resolvers/customization/onboarding.service.spec.ts

Added / updated regression coverage for:

  • full Identification-page payload shape
  • persisted identity success
  • emcmd error but persisted success
  • emcmd success but stale ident.cfg
  • missing ident.cfg during verification

Problems We Found

1. The original API payload was underspecified

The original API updateServerIdentity() call only sent a partial /update payload. It was missing server_https, server_name, and server_addr, while the webgui Identification form sends those fields every time.

That made the API path materially different from the known-good webgui path.

2. The /update response is not cleanly parseable by our Node clients

Once we preserved the original error text in GraphQL, we saw multiple response-shape failures while testing the real mutation against emhttpd:

  • with got: Parse Error: Expected HTTP/, RTSP/ or ICE/
  • with curl and no HTTP/0.9 allowance: curl: (1) Received HTTP/0.9 when not allowed
  • with curl --http0.9: stdout contained a mixed success-ish payload like:
    • <script>replaceOrigin("http://Jeffrey");</script>
    • followed by HTTP/1.1 200 OK headers

So the command side effect can succeed while the response channel still looks malformed or hybrid to a normal Node HTTP client.

3. We verified this in a few steps

We confirmed the transport issue by:

  • comparing the API code to the webgui Identification form and the webgui internal scripts/emcmd helper
  • surfacing the raw low-level error in GraphQL extensions.cause
  • reproducing the mutation against a real server and checking /boot/config/ident.cfg
  • trying both got and curl over the emhttpd unix socket
  • testing curl --http0.9 to confirm the socket response shape was the blocker, not the identity side effect itself

4. The emhttpd / emcmd transport issue is real, but out of scope here

The larger issue is that /update over the emhttpd unix socket does not behave like a clean modern HTTP response for our Node client path.

This PR does not try to solve that transport problem directly.

Instead it makes the mutation resilient to it by trusting the persisted source of truth, ident.cfg, before returning an error.

Next Big Change, Out Of Scope Here

The next substantial fix would be to harden the emhttp socket client itself so it can reliably normalize /update responses without needing downstream mutations to special-case persistence verification.

That likely means one of these:

  • build a dedicated emhttp unix-socket client that tolerates the actual response framing returned by /update
  • or introduce a narrower wrapper around the existing webgui/PHP curl_socket behavior so the API and webgui share the same socket semantics
  • or change the success contract around /update so programmatic callers get a machine-friendly response instead of the current mixed body / header behavior

That is a broader transport-layer change, so I kept it out of this PR and limited the fix to server identity correctness and user-visible behavior.

Notes

There are unrelated local modifications in generated/plugin files in my worktree that are not part of this PR.

Summary by CodeRabbit

  • Tests

    • Added comprehensive test coverage for server identity update operations, including error handling and persistence verification.
    • Expanded test fixtures to validate server identity configuration with multiple field scenarios.
  • Refactor

    • Reorganized server identity management logic with improved error handling and validation.
    • Enhanced server configuration update process with additional field support and verification mechanisms.

- Purpose: align the API server identity mutation with the webgui Identification flow and remove the dead onboarding-only identity path.
- Before: the API sent a partial emcmd payload, kept an unused onboarding stub around, and collapsed the real emcmd failure into a generic GraphQL error.
- Problem: identity updates behaved differently from the webgui path, onboarding still carried a stale implementation, and debugging transport failures was much harder than it needed to be.
- Now: ServerService sends the full Identification-style payload, onboarding no longer carries the unused applyServerIdentity path, and GraphQL preserves the underlying emcmd error message in extensions.cause.
- How: added webgui-style server context fields to updateServerIdentity, expanded regression coverage around payload shape and persistence side effects, and removed the orphaned onboarding helper plus its stale specs.
- Purpose: make emcmd use the same curl-over-unix-socket style transport that the webgui uses for internal /update calls.
- Before: emcmd used got against the emhttp unix socket, which could apply a command successfully and still fail afterward with low-level parse errors like 'Expected HTTP/, RTSP/ or ICE/'.
- Problem: callers such as updateServerIdentity could report a GraphQL error even after emhttp had already applied the setting change.
- Now: emcmd shells out to curl over the unix socket, preserves the existing stdout-is-error contract, and returns a response object with body/stderr metadata for callers.
- How: replaced the got socket POST with execa('curl', ... --unix-socket ... http://localhost/update), added explicit curl exit-code handling, and added direct unit coverage for successful calls, emhttp body errors, socket transport failures, and CSRF token fallback.
- Purpose: revert the curl-based emhttp transport experiment and return emcmd to the original got unix-socket client.\n- Before: emcmd posted to /update through curl and needed transport-specific flags like --http0.9 to get past emhttpd response parsing.\n- Problem: the curl path changed the transport surface area without actually solving the real bug, which is that emhttp can succeed while returning a non-empty response body that our success detection misclassifies.\n- Now: emcmd uses got.post(..., { enableUnixSockets: true }) again, matching the original helper shape while preserving the focused tests around csrf loading and response handling.\n- How: swap execa/curl back to got, restore the response object contract, and update the emcmd spec to assert the got unix-socket request path.
- Purpose: make updateServerIdentity decide success from the persisted ident.cfg state instead of trusting the emcmd response alone.\n- Before: the mutation failed immediately on emcmd transport or response errors, even in cases where emhttp had already updated ident.cfg and the change would survive reboot.\n- Problem: users saw a GraphQL error for an update that actually persisted, while the stale-file case and the persisted-success case were both collapsed into the same generic failure path.\n- Now: the resolver always checks ident.cfg after the emcmd attempt, returns success when the requested identity is present on disk, and only throws when persistence verification fails or the file still contains the old values.\n- How: read getters.paths().identConfig with ini parsing, compare NAME/COMMENT/SYS_MODEL against the requested identity, preserve the original emcmd error as context when persistence did not happen, and add regression tests for the persisted-success, stale-file, and missing-file cases.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6ea261e8-c25e-424b-95e2-f12efb8d8a0a

📥 Commits

Reviewing files that changed from the base of the PR and between 7f04f8e and b5c2d58.

📒 Files selected for processing (5)
  • api/src/core/utils/clients/emcmd.spec.ts
  • api/src/unraid-api/graph/resolvers/customization/onboarding.service.spec.ts
  • api/src/unraid-api/graph/resolvers/customization/onboarding.service.ts
  • api/src/unraid-api/graph/resolvers/servers/server.service.spec.ts
  • api/src/unraid-api/graph/resolvers/servers/server.service.ts
💤 Files with no reviewable changes (1)
  • api/src/unraid-api/graph/resolvers/customization/onboarding.service.ts

Walkthrough

Server identity application logic shifts from onboarding service to server service with enhanced persistence and verification. New emcmd client tests validate HTTP POST behavior, CSRF token handling, and error cases. Server identity updates now include file-based verification against persisted config.

Changes

Cohort / File(s) Summary
New emcmd Client Tests
api/src/core/utils/clients/emcmd.spec.ts
Added comprehensive Vitest suite mocking external dependencies and testing HTTP POST via unix socket, form-encoded body, header construction, error handling for non-empty responses and non-200 status codes, and CSRF token fallback loading from /var/local/emhttp/var.ini.
Onboarding Service Refactor
api/src/unraid-api/graph/resolvers/customization/onboarding.service.ts, api/src/unraid-api/graph/resolvers/customization/onboarding.service.spec.ts
Removed unused emcmd dependency, eliminated applyServerIdentity method and identCfg initialization; deleted all related test coverage including direct emcmd invocation assertions, sanitization behavior, and SSL-dependent parameter tests.
Server Service Identity Handling
api/src/unraid-api/graph/resolvers/servers/server.service.ts, api/src/unraid-api/graph/resolvers/servers/server.service.spec.ts
Added readPersistedIdentity() and buildIdentityUpdateParams() helpers; refactored updateServerIdentity() to construct extended params with server_https/server_name/server_addr, capture emcmd exceptions, verify persisted identity via ident.cfg read, and throw GraphQLError on mismatch. Expanded tests with temp file setup, createEmhttpState helper, positive/negative path coverage for persistence, error conditions, and omitted field preservation.

Sequence Diagram

sequenceDiagram
    participant Service as ServerService
    participant Store as Store (getters)
    participant Emcmd as emcmd Client
    participant FS as File System
    participant Error as Error Handler

    Service->>Store: Read current identity & emhttp state
    Service->>Service: buildIdentityUpdateParams() with<br/>NAME, COMMENT, SYS_MODEL,<br/>server_https, server_name, server_addr
    Service->>Emcmd: POST with extended params & CSRF token
    
    alt emcmd succeeds
        Service->>FS: readPersistedIdentity() from ident.cfg
        FS-->>Service: Parsed identity (name, comment, sysModel)
        Service->>Service: Compare requested vs persisted
        
        alt Identity matches
            Service-->>Service: Return updated server response
        else Identity mismatch
            Service->>Error: Throw GraphQLError with mismatch details
        end
    else emcmd fails
        Service->>Service: Capture emcmdError, log warning
        Service->>FS: readPersistedIdentity() for verification
        FS-->>Service: Parsed identity
        Service->>Service: Compare requested vs persisted
        
        alt Identity matches despite emcmd failure
            Service-->>Service: Log warning, return response
        else Identity mismatch
            Service->>Error: Throw GraphQLError with emcmdError + mismatch
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes introduce new file I/O patterns, error handling with fallback verification logic, and extended parameter construction. Multiple test scenarios cover positive and negative paths with file system interactions. The heterogeneous mix of new method implementations, error capture/verification flow, and expanded test coverage with temporary file setup and helper functions requires substantial individual reasoning per component.

Poem

🐰 We test the emcmd, we verify with care,
Identity persisted in ident.cfg there,
From onboarding's shoulders, the burden now shed,
Server service stands firm with checks well-read!
With params extended and mismatch detection clear,
Our infrastructure's identity rings true and near! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: handling persistence mismatches in server-identity updates during onboarding, which aligns with the PR's core objective of verifying identity persistence after emcmd updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/investigate-onboarding-ident-persistence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Purpose: make the server identity regression spec pass the full API type-check and package test matrix.\n- Before: the spec mocked getters.emhttp() with partial object literals cast directly to the emhttp slice type, which TypeScript rejected during pnpm type-check.\n- Problem: the branch-specific tests were green, but the full api/type-check gate still failed, so the package was not actually in a releasable state.\n- Now: the spec uses a typed helper that returns a full emhttp slice shape, and the array state cases use the real ArrayState enum values.\n- How: add a createEmhttpState() helper in the spec, reuse it across scenarios, and keep the mock data focused while satisfying the store slice contract.
@Ajit-Mehrotra Ajit-Mehrotra self-assigned this Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 95.83333% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.18%. Comparing base (7f04f8e) to head (b5c2d58).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...raid-api/graph/resolvers/servers/server.service.ts 95.83% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1974      +/-   ##
==========================================
+ Coverage   52.08%   52.18%   +0.09%     
==========================================
  Files        1031     1031              
  Lines       71564    71575      +11     
  Branches     8090     8116      +26     
==========================================
+ Hits        37275    37351      +76     
+ Misses      34164    34099      -65     
  Partials      125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

This plugin has been deployed to Cloudflare R2 and is available for testing.
Download it at this URL:

https://preview.dl.unraid.net/unraid-api/tag/PR1974/dynamix.unraid.net.plg

@Ajit-Mehrotra Ajit-Mehrotra requested a review from elibosley March 27, 2026 22:46
@Ajit-Mehrotra Ajit-Mehrotra marked this pull request as ready for review March 27, 2026 22:46
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

const nextComment = comment ?? currentComment;
const nextSysModel = sysModel ?? currentSysModel;

P1 Badge Source omitted COMMENT/SYS_MODEL from persisted identity

When callers omit comment or sysModel, these lines fill them from the in-memory emhttp slice and then always send them in the /update payload. If the slice is still unloaded/stale (it initializes empty and can lag disk), a name-only update can unintentionally write empty or outdated COMMENT/SYS_MODEL values and still pass verification because the method now checks against those derived values, not the prior persisted config.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +54 to +55
server_name: emhttpState.nginx?.lanName || 'localhost',
server_addr: emhttpState.nginx?.lanIp || '127.0.0.1',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop falling back to localhost/127 for name update payload

Defaulting server_name and server_addr to localhost/127.0.0.1 means identity updates can submit synthetic host values whenever nginx state is missing (e.g., early startup or failed state load). Because these fields are now always included in changeNames=Apply, this can overwrite existing management host settings during an otherwise unrelated identity change.

Useful? React with 👍 / 👎.

@elibosley elibosley merged commit 8432974 into main Mar 30, 2026
13 checks passed
@elibosley elibosley deleted the codex/investigate-onboarding-ident-persistence branch March 30, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants