[Improvement] Separate S3 storage configuration for MLRun and Kubeflow Pipeline#295
[Improvement] Separate S3 storage configuration for MLRun and Kubeflow Pipeline#295GiladShapira94 wants to merge 31 commits into
Conversation
Test CE Workflos
[Fix] Seaweed Change
# Conflicts: # .github/workflows/release.yml # charts/mlrun-ce/Chart.yaml
| storage: | ||
| mode: s3 | ||
| s3: | ||
| mode: local | ||
| # Single source of truth for the in-cluster SeaweedFS. | ||
| # Always used by: SeaweedFS IAM config, bucket-init job, and KFP Pipelines. | ||
| # Also used by MLRun and Jupyter when mode is "local". | ||
| local: | ||
| accessKey: "seaweed" | ||
| secretKey: "seaweed123" | ||
| bucket: "mlrun" | ||
| # External AWS S3 credentials — only applied to MLRun and Jupyter when mode is "s3". | ||
| s3: | ||
| accessKey: "" | ||
| secretKey: "" | ||
| bucket: "" |
There was a problem hiding this comment.
- The PR description says this PR adds
pipelines.storage.s3.{bucket,accessKey,secretKey}underpipelines:so users can route KFP to a separate bucket / IAM, and lists a "Use case B (MLRun → AWS, Pipelines → SeaweedFS)" and "Use case C (both → AWS)".
Neither of those test cases is actually achievable from this values file, there is no pipelines.storage.s3.* block, and the new mlrun-ce.pipelines.s3.* helpers hardcode SeaweedFS.
Either the description needs to be rewritten to match the actual implementation, or the implementation needs to add the missing block. - Renaming
storage.s3.* (which used to hold the SeaweedFS creds seaweed / seaweed123 / mlrun) tostorage.local.*and givingstorage.s3.*brand-new "external AWS only" semantics is a possible silent breaking change. Any existing override file that sets storage.s3.accessKey / secretKey / bucket (which was the only path before) will now be ignored at upgrade because storage.mode defaults to local. If we do have existing usages, you can eitherkeep storage.s3.*as the in-cluster SeaweedFS block (matching prior behavior) and introduce a newstorage.aws.*(or storage.external.) for the AWS-only block, or add a Helm fail/NOTES.txt warning that detects "user set storage.s3. but storage.mode is local" and tells them to migrate the values.
| {{- if eq .Values.storage.mode "local" -}} | ||
| {{- .Values.storage.local.bucket -}} | ||
| {{- else -}} | ||
| {{- coalesce .Values.global.infrastructure.aws.bucketName .Values.storage.s3.bucket "mlrun" -}} |
There was a problem hiding this comment.
The PR description says global.infrastructure.aws.bucketName is kept for backwards compatibility, but storage.s3.bucket is the recommended single source of truth. The current coalesce gives bucketName priority - so if a new user follows the recommendation and sets storage.s3.bucket, an inherited bucketName from an umbrella chart or older values file silently wins. Was that the intended precedence, or should storage.s3.bucket win when explicitly set?
| AWS_ENDPOINT_URL_S3: {{ include "mlrun-ce.s3.service.url" . }} | ||
| {{- end }} | ||
| {{- end }} | ||
| {{- end }} No newline at end of file |
| @@ -1,6 +1,9 @@ | |||
| {{- if and (eq .Values.storage.mode "s3") (not .Values.storage.s3.bucket) }} | |||
There was a problem hiding this comment.
storage.s3.{accessKey,secretKey} default to empty strings but only bucket is validated. Switching to s3 mode without creds will silently produce an unusable Secret. Please also fail-fast when accessKey/secretKey are empty (unless global.infrastructure.aws.s3NonAnonymous is true).
| {{- end }} | ||
| {{- if and (eq .Values.storage.mode "azure-blob") (not .Values.storage.azure.containerName) }} | ||
| {{ fail "storage.mode is set to \"azure-blob\" but storage.azure.containerName is not provided. Please set storage.azure.containerName." }} | ||
| {{- end }} No newline at end of file |
|
|
||
| {{- define "mlrun-ce.pipelines.s3.insecure" -}} | ||
| true | ||
| {{- end -}} |
There was a problem hiding this comment.
Every mlrun-ce.pipelines.s3.* helper just delegates to seaweedfs.s3.* or hardcodes a literal. Either drop the family and call seaweedfs.s3.* directly, or actually wire it to pipelines.storage.s3.* per the PR description.
📝 Description
This PR separates the storage credential configuration into two distinct paths:
storage.local.*for the bundled in-cluster SeaweedFS (always used by SeaweedFS IAM, the bucket-init job, and KFP Pipelines), andstorage.s3.*for external AWS S3 (used only by MLRun and Jupyter whenstorage.mode: s3).The default
storage.modeis changed froms3tolocal, reflecting that the default CE installation uses the bundled SeaweedFS rather than external AWS S3.New dedicated
_helpers.tplpartials (mlrun-ce.seaweedfs.s3.*andmlrun-ce.pipelines.s3.*) ensure Pipelines and SeaweedFS always resolve credentials fromstorage.local.*regardless of the activestorage.mode, eliminating the previous credential cross-contamination when switching modes.🛠️ Changes Made
charts/mlrun-ce/values.yaml:storage.modedefault froms3→localstorage.localblock (accessKey,secretKey,bucket) as the single source of truth for in-cluster SeaweedFS credentialsstorage.s3.accessKey/secretKey/bucketdefaults (now empty strings; only meaningful whenmode: s3)charts/mlrun-ce/templates/_helpers.tpl:mlrun-ce.s3.accessKey/secretKey/bucket— now branches onstorage.mode(localvss3)mlrun-ce.seaweedfs.s3.*helpers — always resolve fromstorage.local.*mlrun-ce.pipelines.s3.*helpers — always delegate tomlrun-ce.seaweedfs.s3.*mlrun-ce.artifactPath,mlrun-ce.featureStore.dataPrefix,mlrun-ce.model-endpoint.monitoring.*— replaced hardcodedglobal.infrastructure.aws.bucketName | default "mlrun"withmlrun-ce.s3.bucketcharts/mlrun-ce/templates/config/storage-secret.yaml—AWS_ENDPOINT_URL_S3now only injected whenstorage.mode: local;storage.s3no longer sets a custom endpointcharts/mlrun-ce/templates/config/storage-validation.yaml— added fail guard forstorage.mode: localwith missingstorage.local.bucketcharts/mlrun-ce/templates/config/mlrun-env-configmap.yaml— updated comment describing per-mode env varscharts/mlrun-ce/templates/pipelines/**— all pipeline templates now usemlrun-ce.pipelines.s3.*helperscharts/mlrun-ce/templates/seaweedfs/**— bucket-init job and IAM config now usemlrun-ce.seaweedfs.s3.*helperscharts/mlrun-ce/templates/NOTES.txt— S3 credentials display updated to referencestorage.local.*charts/mlrun-ce/Chart.yaml— version bumped0.11.0-rc.36→0.11.0-rc.37charts/mlrun-ce/README.md— version matrix updated to0.11.0-rc.37✅ Checklist
charts/mlrun-ce/Chart.yaml.🧪 Testing
helm lint charts/mlrun-ce— run locally to catch syntax errors in refactored helpershelm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml— render all templates to verify helper resolutionstorage.mode: s3path: render with--set storage.mode=s3,storage.s3.accessKey=foo,storage.s3.secretKey=bar,storage.s3.bucket=mybucketand confirmstorage-secretdoes not containAWS_ENDPOINT_URL_S3storage.mode: localpath: render with defaults and confirmstorage-secretcontainsAWS_ENDPOINT_URL_S3pointing at the SeaweedFS servicemlpipeline-seaweedfs-artifactalways usesstorage.local.*regardless ofstorage.mode🔗 References
🚨 Breaking Changes?
Consumers upgrading from a previous release must:
Rename
storage.s3.accessKey/secretKey/bucket→storage.local.accessKey/secretKey/bucketif they were using the default SeaweedFS-backed installation (i.e., the old defaultmode: s3pointed at SeaweedFS withseaweed/seaweed123/mlrun).Set
storage.mode: s3explicitly if they were previously relying on the defaultmode: s3to pass external AWS credentials — the new default islocal.Users who supply an external AWS S3 configuration no longer need to clear
AWS_ENDPOINT_URL_S3manually; the secret now omits it whenmode: s3.🔍️ Additional Notes
admin_installation_values.yaml,non_admin_installation_values.yaml,non_admin_cluster_ip_installation_values.yaml) contain nostorage.*overrides, so they correctly inherit the new defaults fromvalues.yamlwithout modification.storage.local.*) in all modes — this is by design and is documented in the updated helper comments.Warnings
Breaking change — existing
storage.s3.*users: Anyone who previously used the default install (which wasmode: s3pointing at SeaweedFS withseaweed/seaweed123) must migrate their overrides tostorage.local.*.Their upgrade path: