docs(plan): record state-repo seed; go-live is the remaining step#215
Merged
Conversation
…ing step The state repo DataHackIL/tfht_enforce_idx_state has been seeded from local data/news_items: the rich candidate state (27,568 candidates + queues + attempts + verdicts + budget + yield + backfill_batches + runs + metrics), recovered from orphaned .jsonl.gz (left by the reverted gzip experiment) back to plain JSONL. Excluded from the seed: prefilter ML models + decision-log telemetry, the regenerable engine_query_cache, and candidate_provenance.jsonl (119 MB, over GitHub's 100 MB per-file limit; rebuilds going forward). Re-enabling the scheduled workflows (go-live) is deferred by operator choice until a manual dispatch verifies the seeded state end to end. Updates .agent-plan.md to mainline truth. Co-Authored-By: Claude Opus 4.8 <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Updates .agent-plan.md to reflect current mainline truth now that the external state repository has been seeded, and clarifies that the remaining go-live work is to re-enable non-scraping scheduled workflows after manual verification.
Changes:
- Records that
DataHackIL/tfht_enforce_idx_statehas been seeded from localdata/news_items, including what was included/excluded and notable file-size constraints. - Refines
UNIFY-PR-06to explicitly frame “go-live” as the remaining step (re-enabling only non-scraping scheduled workflows) and notes it is intentionally deferred pending a manual dispatch verification.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #215 +/- ##
=======================================
Coverage 92.82% 92.82%
=======================================
Files 83 83
Lines 12302 12302
=======================================
Hits 11419 11419
Misses 883 883 🚀 New features to boost your workflow:
|
|
pr-agent-context report: This run includes a failing check on PR #215.
Diagnose and fix the failing checks below, then push all of these changes in a single commit.
# Failing Checks
## FAIL-1
Type: Commit status
Context: pre-commit.ci - pr
Status: error
URL: https://results.pre-commit.ci/run/github/1159993403/1781428560.jSuy65FjQrGsHR0pDh1Lig
Summary:
error during mergeable checkRun metadata: |
This was referenced Jun 14, 2026
shaypal5
added a commit
that referenced
this pull request
Jun 14, 2026
…edes #215) (#219) #215 recorded the state-repo seed as a clean success. It wasn't: that seed leaked a live Google CSE API key (captured into a discovery run's errors[] from a CSE-403 URL and pushed to the PUBLIC state repo). This corrects .agent-plan.md to mainline truth: - the incident and its remediation (key rotated; public-repo history purged with a clean root; redaction root-cause fix #217 merged; re-seeded state is key-scrubbed); - go-live (UNIFY-PR-06) is now gated on the state-push secret-scan guard (issue #218) in addition to the manual-dispatch verification; - last-merged status points at the #217 redaction fix. Doc-only; opened as a PR for review rather than merged directly. Co-authored-by: Claude Opus 4.8 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Docs-only
.agent-plan.mdupdate to mainline truth.The state repo
DataHackIL/tfht_enforce_idx_statehas been seeded from localdata/news_itemsas the discovery/ingest source of truth:.jsonl.gz(left by the reverted gzip experiment perf(state): gzip the big rewrite-per-run discovery state files #208) back to plain JSONL: 27,568 candidates + retry/backfill queues + scrape attempts + domain verdicts + search budget + query yield + backfill_batches + runs + metrics.engine_query_cache, andcandidate_provenance.jsonl(119 MB — over GitHub's 100 MB per-file limit; rebuilds going forward).latest_candidates.jsonl(62 MB) andbackfill_queue.jsonl(52 MB) are over GitHub's 50 MB soft-warning size.Go-live (re-enabling scheduled workflows) is deferred by operator choice until a manual dispatch verifies the seeded state end to end. The parked scrape→classify decouple stays tracked as #213.
🤖 Generated with Claude Code