feat(db): consolidate REST into shared nico-pg-cluster#3081
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
Summary by CodeRabbit
WalkthroughThis PR provisions NICo REST database prerequisites on the shared PostgreSQL cluster, syncs credentials into the REST namespace, updates the install script to consume them, and aligns REST chart defaults with the new database and user. ChangesNICo REST DB provisioning
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant SetupScript
participant PostgresCluster
participant ESO
participant K8sSecret
participant HelmInstall
SetupScript->>PostgresCluster: wait for nico-pg-cluster Running
SetupScript->>PostgresCluster: exec CREATE EXTENSION pg_trgm on nico_rest
ESO->>PostgresCluster: read Zalando-generated credentials
ESO->>K8sSecret: write nico-rest-pg-creds
SetupScript->>K8sSecret: wait for nico-rest-pg-creds
SetupScript->>K8sSecret: base64-decode username/password
SetupScript->>HelmInstall: pass decoded dbCreds values
Related Issues: None found. Related PRs: None found. Suggested labels: helm, database, enhancement Suggested reviewers: None found. 🐰 A cluster hums, a secret wakes, 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@ajf given that this is a breaking change can we get this in 2.0 RC? |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-07-02 03:52:23 UTC | Commit: 3240863 |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@helm-prereqs/setup.sh`:
- Around line 384-397: The pg_trgm install in the setup script is race-prone
because it runs once and suppresses errors, so it can miss the window before the
nico_rest database is ready. Update the installation block around the
_PG_PRIMARY lookup and kubectl exec call to use a bounded retry/wait loop
similar to the later nico-rest-pg-creds wait logic, and only proceed when the
CREATE EXTENSION succeeds. Also tighten the fallback message so it distinguishes
“database not ready yet” from “rest.enabled=false” rather than using one
ambiguous message.
- Around line 730-749: The nico-rest setup flow currently waits forever for the
ESO-managed secret and injects DB credentials directly via Helm CLI flags.
Update the credential sync loop in setup.sh around the nico-rest-pg-creds check
to include a bounded timeout and fail fast with a clear error if the secret
never appears, and replace the --set nico-rest-common.secrets.dbCreds.* usage in
NICO_REST_CMD with a safer mechanism such as a temporary values file or direct
Secret reference so secret values are not exposed or broken by shell/Helm
parsing.
In `@helm-prereqs/values.yaml`:
- Around line 128-134: The doc comment in values.yaml names the wrong Secret for
the ESO sync. Update the comment to match the target used by the
ClusterExternalSecret in the ESO template, using the actual secret name
referenced by the corresponding ESO resource (the one intended to avoid the
db-creds collision). Keep the description aligned with the generated credentials
flow and the rest namespace consumer names, and remove the misleading mention of
db-creds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cd803d4b-d4b6-43d2-9cbf-6a2ff9e41203
📒 Files selected for processing (6)
helm-prereqs/setup.shhelm-prereqs/templates/eso-external-secrets.yamlhelm-prereqs/templates/postgresql.yamlhelm-prereqs/values.yamlhelm/rest/nico-rest/charts/nico-rest-api/values.yamlhelm/rest/nico-rest/charts/nico-rest-db/values.yaml
thossain-nv
left a comment
There was a problem hiding this comment.
Looks good @shayan1995 Existing installations have Values.rest.enabled, correct?
Should we populate the Breaking Changes section in the PR description?
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
helm-prereqs/setup.sh (2)
764-770: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winGuard the temp credentials file with a
trapinstead of relying only on the deploy/skip branches.
_NICO_REST_CREDS_FILE(containing the decoded DB password in plaintext) is only removed at Lines 815 and 817, on the two known exit paths of the deploy decision. Any earlier abnormal exit between file creation (Line 764) and those branches (e.g. a failing command under aset -eshell, or a future refactor that adds a step in between) would leave the plaintext credentials file behind in/tmp.chmod 600limits exposure to the local user but does not clean it up.A
trap ... EXITregistered right aftermktempremoves this class of bug entirely and is idiomatic for temp-secret handling in shell scripts.As per path instructions, review shell scripts for "secret-safe logging" and "idempotency"; guaranteed cleanup on any exit path is part of that guarantee.
🔒 Suggested fix: trap-based cleanup
_NICO_REST_CREDS_FILE="$(mktemp)" chmod 600 "${_NICO_REST_CREDS_FILE}" +trap 'rm -f "${_NICO_REST_CREDS_FILE}"' EXIT printf 'nico-rest-common:\n secrets:\n dbCreds:\n username: "%s"\n password: "%s"\n' \And remove the now-redundant explicit
rm -fcalls at Lines 815/817 (or leave them as a harmless no-op after the trap already fires).Also applies to: 815-817
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm-prereqs/setup.sh` around lines 764 - 770, The temp credentials file created in the `_NICO_REST_CREDS_FILE` setup needs guaranteed cleanup on every exit path, not just the deploy/skip branches. Add a `trap` immediately after `mktemp` in `setup.sh` so the file is removed on EXIT even if a later command fails or the script exits early, and then clean up the now-redundant explicit `rm -f` handling in the deploy decision block.Source: Path instructions
398-400: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low valuestderr from
kubectl execis swallowed, blurring "db not ready" from other failures.
2>/dev/nullon theCREATE EXTENSIONattempt means any failure (auth error, transient network blip, syntax issue) is retried and reported identically to "database not yet created by operator." This is a minor diagnostic gap since the loop already retries regardless of cause, and the final summary message at Lines 409-411 gives operators enough to act on.Also applies to: 405-405
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm-prereqs/setup.sh` around lines 398 - 400, The kubectl exec check in setup.sh is swallowing stderr during the CREATE EXTENSION pg_trgm probe, which makes all failures look like the database is merely not ready. Remove the stderr redirection from the kubectl exec invocation in the retry block around the CREATE EXTENSION attempt so failures are visible, and keep the existing retry logic and final summary message unchanged; use the kubectl exec / psql probe as the locating symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@helm-prereqs/setup.sh`:
- Around line 764-770: The temp credentials file created in the
`_NICO_REST_CREDS_FILE` setup needs guaranteed cleanup on every exit path, not
just the deploy/skip branches. Add a `trap` immediately after `mktemp` in
`setup.sh` so the file is removed on EXIT even if a later command fails or the
script exits early, and then clean up the now-redundant explicit `rm -f`
handling in the deploy decision block.
- Around line 398-400: The kubectl exec check in setup.sh is swallowing stderr
during the CREATE EXTENSION pg_trgm probe, which makes all failures look like
the database is merely not ready. Remove the stderr redirection from the kubectl
exec invocation in the retry block around the CREATE EXTENSION attempt so
failures are visible, and keep the existing retry logic and final summary
message unchanged; use the kubectl exec / psql probe as the locating symbol.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9d8a1b82-305d-4b9e-8d1c-2593c90b4857
📒 Files selected for processing (2)
helm-prereqs/setup.shhelm-prereqs/values.yaml
✅ Files skipped from review due to trivial changes (1)
- helm-prereqs/values.yaml
|
@thossain-nv |
Move the NICo REST API database from the standalone postgres.postgres StatefulSet onto the Zalando-managed nico-pg-cluster, eliminating the separate REST database instance. Changes: - postgresql.yaml: add nico-rest.nico user and nico_rest database, gated on rest.enabled (default: true) - eso-external-secrets.yaml: add nico-rest-db-eso ClusterExternalSecret syncing Zalando credentials to nico-rest namespace as nico-rest-pg-creds (named to avoid collision with the db-creds hook managed by nico-rest-common) - values.yaml: add rest.enabled / rest.namespace toggles with documentation - setup.sh: extract Zalando credentials from nico-rest-pg-creds and inject into nico-rest-common.secrets.dbCreds.* at install time; create pg_trgm extension on nico_rest after cluster reaches Running state (Zalando preparedDatabases conflicts with the databases section in operator v1.10) - nico-rest-db/values.yaml: update db.host/name/user defaults to nico-pg-cluster.postgres.svc.cluster.local / nico_rest / nico-rest.nico - nico-rest-api/values.yaml: same db defaults plus secrets.dbCreds: db-creds to enable the secret-based password path for live rotation
- setup.sh pg_trgm block: replace single-attempt if with bounded retry loop (24×5s = 120s); re-detect Patroni primary each iteration; distinguish 'nico_rest not yet created' (retrying) from a final timeout message that calls out rest.enabled=false vs true - setup.sh nico-rest-pg-creds wait: add 120s bounded timeout with a clear fail-fast error pointing at the ClusterExternalSecret; write Zalando credentials to a chmod-600 mktemp file and pass -f to helm instead of --set so passwords are not visible in process arguments and are not subject to Helm --set special-char escaping; rm temp file on both deploy and skip paths - values.yaml: correct block and inline comments from 'db-creds' to 'nico-rest-pg-creds' to match the actual ESO target secret name
d172dcf to
e2acc67
Compare
Update nico-site-agent.yaml DB_ADDR and DB_DATABASE to point at the shared nico-pg-cluster instead of the defunct standalone postgres.postgres instance. The site-agent uses the same nico-rest.nico credentials (via db-creds) as nico-rest-api, so no credential changes are needed.
|
@shayan1995 said more simply, in the values file for helm, on already deployed sites needs to say |
Move the NICo REST API database from the standalone
postgres.postgresStatefulSet onto the Zalando-managednico-pg-cluster, eliminating the separate REST database instance.Changes
postgresql.yaml: addnico-rest.nicouser andnico_restdatabase, gated onrest.enabled(default:true)eso-external-secrets.yaml: addnico-rest-db-esoClusterExternalSecret syncing Zalando credentials to thenico-restnamespace asnico-rest-pg-creds— named to avoid collision with thedb-credshook managed bynico-rest-common(hook-delete-policy: before-hook-creationcauses a Helm/ESO race if ESO targetsdb-credsdirectly)values.yaml: addrest.enabled/rest.namespacetoggles with documentationsetup.sh:nico-rest-pg-credsto be synced by ESO; extract Zalando credentials and write to achmod 600temp file; pass via-ftohelm upgrade --installto populatenico-rest-common.secrets.dbCreds.*at install time (avoids--setcredential exposure and Helm special-char escaping issues)pg_trgmextension onnico_restafter the cluster reachesRunningstate, with a 120 s bounded retry loopnico-rest-db/values.yaml: updatedb.host/db.name/db.userdefaults tonico-pg-cluster.postgres.svc.cluster.local/nico_rest/nico-rest.niconico-rest-api/values.yaml: same db defaults plussecrets.dbCreds: db-credsto enable the secret-based password path for live rotationRelated Issues
N/A
Type of Change
Change - Changes in existing functionality
Breaking Changes
This PR contains breaking changes.
rest.enableddefaults totrue. On upgrade,nico-prereqswill create anico-rest.nicoPostgres user, anico_restdatabase onnico-pg-cluster, and anico-rest-db-esoClusterExternalSecret in every site that does not explicitly setrest.enabled: false. Sites that do not runnico-restshould addrest.enabled: falseto their nico-prereqs site values before upgrading to prevent these resources from being created.Additionally,
nico-rest-apiandnico-rest-dbchart defaults now point atnico-pg-cluster(host,name,user). Any site runningnico-restagainst the old standalone instance must either migrate tonico-pg-clusterviasetup.shor override these values explicitly.Testing
Manual end-to-end validation performed on dev6 (rg-forge-dev6, 3-node cluster).
Additional Notes