Skip to content

WKBCH-25: Complete CRR support#77

Merged
dvasilas merged 5 commits into
mainfrom
improvement/complete-crr
May 15, 2026
Merged

WKBCH-25: Complete CRR support#77
dvasilas merged 5 commits into
mainfrom
improvement/complete-crr

Conversation

@dvasilas
Copy link
Copy Markdown
Contributor

Builds on #49 to make replication actually fire end-to-end.

  • Single cloudserver instance. Source and destination buckets owned by the same account.
  • The replication role is created in testaccount (the bucket-owning account) via vault's accountSeeds.

Adds a helper script (scripts/enable-crr.sh) that creates the source and destination buckets, enables versioning, and applies the replication policy.
Idempotent for CI use.

PR #49 added the CRR plumbing but didn't create the IAM role backbeat
assumes for replication. Add it in the data account via vault
accountSeeds so role-issued credentials authorize source and destination
S3 calls.

Ship scripts/enable-crr.sh to create the buckets and apply the
replication policy idempotently.
Comment thread cmd/config.go
@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

  • cmd/config.go:65 — HostUID/HostGID bypassed by early return on empty path in LoadEnvironmentConfig, would default to 0 (root). Move initialization into DefaultEnvironmentConfig() so both code paths are covered.

    Rest of the PR looks solid: replication-role seed, admin-backbeat credentials file, kafka topic name fix (backbeat-metrics-group-crrbackbeat-metrics to match backbeat config), set +e guard for zookeeper idempotency, and user: HOST_UID:HOST_GID on setup containers are all correct.

    Review by Claude Code

Both containers write into host-mounted directories.
Becuase the run as root, they left root-owned files.
@dvasilas dvasilas force-pushed the improvement/complete-crr branch from 15800df to 705326d Compare April 29, 2026 13:07
@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

LGTM — changes are internally consistent and well-structured. Replication role, admin credentials file, Kafka topic rename, host UID/GID mapping, and the enable-crr.sh helper all line up correctly across the touched files. The set +e guard for zookeeper-shell.sh NodeExists is appropriate, and the new setup-kafka dependency on backbeat closes a real race condition.

Review by Claude Code

"port": 8500,
"adminPort": 8600,
"adminCredentialsFile": "/home/scality/backbeat/node_modules/vaultclient/tests/utils/admincredentials.json"
"adminCredentialsFile": "/conf/admin-backbeat.json"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the diff between the two? or was there some permissions issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same content, I just thought it's a bit better for future readers to have the file here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty

Comment thread cmd/config.go

func LoadEnvironmentConfig(path string) (EnvironmentConfig, error) {
cfg := DefaultEnvironmentConfig()
cfg.HostUID = os.Getuid()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember messing around with this but it was for the logs/volumes with the host, anything related to CRR in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to CRR, just a side-fix for an issue I noticed (705326d).

It is about file permissions in volumes on the host.
Because containers run as root, files in volumes are owned by root and need sudo to rm.

dvasilas added a commit to scality/log-courier that referenced this pull request Apr 30, 2026
Pin workbench to the head of WKBCH-25 (scality/workbench#77), which
provisions the queue-populator's raft-id-dispatcher to cover all data
raft sessions. The previous SHA only provisioned raftIds 0,1,2 while
metadata-s3 with raftSessions=3 creates 4 sessions (0=admin, 1..3=data),
leaving raftId 3 uncovered: any bucket whose hash landed there sat at
ReplicationStatus=PENDING forever and the CRR e2e test timed out.

Temporary pin; revert to a tagged workbench release once WKBCH-25 is
merged and tagged.
WKBCH-25 added redis to backbeat config, which activates backlogControl
(previously a no-op since redis was unreachable). It then throttles the
conductor whenever the bucket-tasks topic has any consumer lag, which
causes test flakiness under bursty parallel load: lifecycle stalls long
enough to blow expiration timeouts.
dvasilas added a commit to scality/log-courier that referenced this pull request Apr 30, 2026
Pin workbench to the head of WKBCH-25 (scality/workbench#77) which
disables the lifecycle conductor's backlogControl. Without this, the
conductor throttles whenever the bucket-tasks topic has any consumer
lag, which the parallel e2e suite triggers reliably and stalls
lifecycle expiration past the test timeout.
Comment thread templates/global/docker-compose.yaml
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

  • templates/global/docker-compose.yaml:187 — backbeat service is missing depends_on: redis: condition: service_healthy now that its config.json includes a redis block

    Otherwise clean: credentials/ARNs are consistent across files, embed directive covers the new template, the set +e scope in kafka/setup.sh is correctly bounded, and the HostUID/HostGID injection is safe through the mergo merge.

    Review by Claude Code

@dvasilas dvasilas force-pushed the improvement/complete-crr branch from 499253b to f9a33d1 Compare April 30, 2026 20:24
dvasilas added a commit to scality/log-courier that referenced this pull request Apr 30, 2026
Pin workbench to the head of WKBCH-25 (scality/workbench#77) which
provisions the queue-populator for all data raft sessions. Without
this, raftId 3 was uncovered: any source bucket whose hash landed
there sat at ReplicationStatus=PENDING forever and the CRR e2e test
timed out.
Comment thread templates/global/docker-compose.yaml
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

  • templates/global/docker-compose.yaml:187 — backbeat now requires redis (added in backbeat config.json) but has no depends_on: redis: condition: service_healthy, risking a startup race condition

    Review by Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

LGTM — changes are consistent and well-structured. Replication-role seed, admin credentials file, Kafka topic rename, ZooKeeper provision path fix, host UID/GID for setup containers, and the enable-crr.sh helper all look correct. No issues found.

Review by Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

LGTM — clean, well-structured changes. Template modifications produce valid configs, the new helper script is idempotent, and the zookeeper provision path fix (0-indexed → 1-indexed) correctly aligns with raft session IDs. No issues found.

Review by Claude Code

@dvasilas dvasilas merged commit 6b1cecb into main May 15, 2026
3 checks passed
@dvasilas dvasilas deleted the improvement/complete-crr branch May 15, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants