NE-2572: use add/del haproxy api calls#763
NE-2572: use add/del haproxy api calls#763jcmoraisjr wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughRemoved HAProxy dynamic-server template generation and MaxDynamicServers config; replaced template-driven dynamic slots with direct HAProxy runtime commands for server add/update/disable/delete and updated interfaces to pass backend and service context. Changes
Sequence Diagram(s)sequenceDiagram
participant Router
participant ConfigManager
participant HAProxyRuntime
Router->>ConfigManager: Register(id, backend, route)
Router->>ConfigManager: ReplaceRouteEndpoints(id, svc, oldEPs, newEPs, weight)
ConfigManager->>ConfigManager: compute added/modified/deleted endpoints
ConfigManager->>HAProxyRuntime: "add server <backend> <name> addr <ip> [ssl/verify/...]"
HAProxyRuntime-->>ConfigManager: OK / error
ConfigManager->>HAProxyRuntime: "set server <backend>/<name> addr|weight|state ..."
HAProxyRuntime-->>ConfigManager: OK / error
ConfigManager->>HAProxyRuntime: "del server <backend>/<name>"
HAProxyRuntime-->>ConfigManager: OK / error
ConfigManager-->>Router: result (nil or error)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 374-376: The ca-file path currently uses string(b.name) (HAProxy
backend name) which doesn't match the on-disk PEM name for route-specific
DestinationCACertificate; change the path construction so it uses the
destination CA certificate ID (the key used in cfg.Certificates / the route's
DestinationCACertificate) instead of string(b.name). Concretely, when choosing
the CA file in the conditional that checks defaultDestinationCA and
cfg.Certificates[cfg.Host+"_pod"], build the path with the destination CA ID
variable (use defaultDestinationCA when that's the chosen CA, or the
route-specific DestinationCACertificate ID otherwise) so the
path.Join(workingDir, "router/cacerts", "<destination-ca-id>.pem") matches the
actual written filename. Ensure references to b.name are removed from that
ca-file path logic.
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 509-519: The code enables health checks for newly added endpoints
but mistakenly re-enables the former single endpoint from entry.activeEndpoints,
which is empty during initial sync; update the block that builds newEPs to
append the former endpoint from oldEndpoints (the pre-scale-up list) instead of
entry.activeEndpoints so the original server gets health checks on a 1→N
scale-up; locate the logic around newEndpoints, addedEndpoints, newEPs and
entry.activeEndpoints in the function in manager.go and replace the use of
entry.activeEndpoints with the appropriate element(s) from oldEndpoints.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 00ae6c70-df1a-4e7b-9f2b-a256506f707a
📒 Files selected for processing (9)
images/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
|
@jcmoraisjr: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore. HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload. This update is being made in at least two phases: Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise. Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct. New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.
74a07f4 to
c005e0f
Compare
|
@jcmoraisjr: This pull request references NE-2572 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
pkg/router/router_test.go (1)
177-179: Typo in connection info path: "dymmy" should be "dummy".While this may be pre-existing, since this line is being modified, it's a good opportunity to fix the typo.
✏️ Suggested fix
DynamicConfigManager: haproxyconfigmanager.NewHAProxyConfigManager(templateplugin.ConfigManagerOptions{ - ConnectionInfo: "unix:///var/lib/dymmy", + ConnectionInfo: "unix:///var/lib/dummy", }),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/router_test.go` around lines 177 - 179, The ConnectionInfo string passed into haproxyconfigmanager.NewHAProxyConfigManager via the ConfigManagerOptions in the DynamicConfigManager setup contains a typo ("unix:///var/lib/dymmy"); update that value to the correct path ("unix:///var/lib/dummy") so the DynamicConfigManager initialization (haproxyconfigmanager.NewHAProxyConfigManager and ConfigManagerOptions.ConnectionInfo) uses the proper connection path.pkg/router/template/configmanager/haproxy/manager.go (1)
562-570: Inconsistent error handling: fail-fast vs aggregated errors.
RemoveRouteEndpointsreturns immediately on the firstDeleteServererror (fail-fast), whileReplaceRouteEndpointsaggregates all errors usingerrors.Join. Consider aligning the error handling patterns for consistency.♻️ Option to align with ReplaceRouteEndpoints pattern
+ var errs []error for _, ep := range endpoints { log.V(4).Info("deleting server for endpoint", "endpoint", ep.ID) if err := backend.DeleteServer(ep); err != nil { - return err + errs = append(errs, fmt.Errorf("error deleting server %s: %w", ep.ID, err)) } } - return nil + return errors.Join(errs...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/template/configmanager/haproxy/manager.go` around lines 562 - 570, RemoveRouteEndpoints currently fails fast on the first backend.DeleteServer error while ReplaceRouteEndpoints aggregates errors with errors.Join; update RemoveRouteEndpoints to mirror ReplaceRouteEndpoints by collecting each DeleteServer error into a slice (or []error), continue deleting remaining endpoints, log per-endpoint failures with log.V(...) as currently done, and finally return errors.Join(collectedErrs...) (or nil if none) instead of returning immediately on the first error.pkg/router/template/types.go (1)
186-191: Add documentation for new configuration fields.The
WorkingDirandDefaultDestinationCAfields have empty doc comments. These should be documented to clarify their purpose.📝 Suggested documentation
- // + // WorkingDir is the router's working directory containing configuration + // files, certificates, and other router-managed resources. WorkingDir string - // + // DefaultDestinationCA is the path to the default CA certificate file used + // to verify backend server certificates for re-encrypt routes when no + // route-specific destination CA is configured. DefaultDestinationCA string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/router/template/types.go` around lines 186 - 191, Add descriptive doc comments for the WorkingDir and DefaultDestinationCA struct fields: explain that WorkingDir is the path the router uses as its runtime working directory for temporary files/operations, and DefaultDestinationCA is the PEM-encoded CA certificate (or path/identifier used by your config) applied as the default trust anchor for outbound TLS destination verification when no per-destination CA is configured. Update the comments directly above the WorkingDir and DefaultDestinationCA field declarations in types.go so they succinctly describe expected value format and purpose for maintainers and users.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 495-497: The current early return when errsEndpoints has entries
aborts the subsequent health-check configuration and can leave backends
inconsistent; update the flow in the function that processes endpoints so that
after collecting errsEndpoints you still call the health-check configuration
routine (the existing health check code or method—e.g., configureHealthChecks /
the block immediately following endpoint processing) for successfully processed
backends, collect any health-check errors, and then return a joined error
containing both errsEndpoints and health-check errors (using errors.Join).
Ensure you do not short-circuit before invoking the health-check configuration
and aggregate all errors before returning.
---
Nitpick comments:
In `@pkg/router/router_test.go`:
- Around line 177-179: The ConnectionInfo string passed into
haproxyconfigmanager.NewHAProxyConfigManager via the ConfigManagerOptions in the
DynamicConfigManager setup contains a typo ("unix:///var/lib/dymmy"); update
that value to the correct path ("unix:///var/lib/dummy") so the
DynamicConfigManager initialization
(haproxyconfigmanager.NewHAProxyConfigManager and
ConfigManagerOptions.ConnectionInfo) uses the proper connection path.
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 562-570: RemoveRouteEndpoints currently fails fast on the first
backend.DeleteServer error while ReplaceRouteEndpoints aggregates errors with
errors.Join; update RemoveRouteEndpoints to mirror ReplaceRouteEndpoints by
collecting each DeleteServer error into a slice (or []error), continue deleting
remaining endpoints, log per-endpoint failures with log.V(...) as currently
done, and finally return errors.Join(collectedErrs...) (or nil if none) instead
of returning immediately on the first error.
In `@pkg/router/template/types.go`:
- Around line 186-191: Add descriptive doc comments for the WorkingDir and
DefaultDestinationCA struct fields: explain that WorkingDir is the path the
router uses as its runtime working directory for temporary files/operations, and
DefaultDestinationCA is the PEM-encoded CA certificate (or path/identifier used
by your config) applied as the default trust anchor for outbound TLS destination
verification when no per-destination CA is configured. Update the comments
directly above the WorkingDir and DefaultDestinationCA field declarations in
types.go so they succinctly describe expected value format and purpose for
maintainers and users.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: da89419c-e254-4010-955c-9113979cb278
📒 Files selected for processing (9)
images/router/haproxy/conf/haproxy-config.templatepkg/cmd/infra/router/template.gopkg/router/router_test.gopkg/router/template/configmanager/haproxy/backend.gopkg/router/template/configmanager/haproxy/blueprint_plugin_test.gopkg/router/template/configmanager/haproxy/manager.gopkg/router/template/router.gopkg/router/template/template_helper.gopkg/router/template/types.go
✅ Files skipped from review due to trivial changes (1)
- pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/router/template/template_helper.go
- pkg/cmd/infra/router/template.go
- pkg/router/template/router.go
- pkg/router/template/configmanager/haproxy/backend.go
| if len(errsEndpoints) > 0 { | ||
| return errors.Join(errsEndpoints...) | ||
| } |
There was a problem hiding this comment.
Early return on endpoint errors skips health check configuration.
If any endpoint operation fails (delete/update/add), the function returns immediately without configuring health checks. This could leave backends in an inconsistent state where some servers were added but health checks weren't enabled.
Consider whether health check configuration should still be attempted for successfully processed endpoints, or document this as intentional fail-fast behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/router/template/configmanager/haproxy/manager.go` around lines 495 - 497,
The current early return when errsEndpoints has entries aborts the subsequent
health-check configuration and can leave backends inconsistent; update the flow
in the function that processes endpoints so that after collecting errsEndpoints
you still call the health-check configuration routine (the existing health check
code or method—e.g., configureHealthChecks / the block immediately following
endpoint processing) for successfully processed backends, collect any
health-check errors, and then return a joined error containing both
errsEndpoints and health-check errors (using errors.Join). Ensure you do not
short-circuit before invoking the health-check configuration and aggregate all
errors before returning.
DCM currently works by creating blueprint backends and empty backend server slots upfront. These empty slots are updated later with new endpoints, created on scale out operations, without the need to fork-and-reload haproxy. Scale in operations work in a similar way: backend servers are disabled when endpoints are not available anymore.
HAProxy has add/del apis since 2.5, which adds the benefit of not having to create empty slots. This means that we don't need to predict the ideal number of empty servers per backend, so backends can be configured with the currently active endpoints only, and it can scale to any number of new backend servers without the need to reload.
This update is being made in at least two phases:
Phase 1: update config manager (manager.go) and client's backend (backend.go) the minimum to support the new api calls. The priorities are: new approach fully functional and an easier code update to revise.
Phase 2: cleanup dead code, eventually combining types into a single, coherent and simplified struct.
New phases should happen as a revisit on other parts of the code that can be optimized with the new approach, or the code cleanup, like changing route resources and the handling of the haproxy lookup maps.
Summary by CodeRabbit
Release Notes
New Features
Deprecated
--max-dynamic-serversflag is now deprecated.Changes