Skip to content

docs(plan): record state-repo seed; go-live is the remaining step#215

Merged
shaypal5 merged 1 commit into
mainfrom
codex/plan-seed-done
Jun 14, 2026
Merged

docs(plan): record state-repo seed; go-live is the remaining step#215
shaypal5 merged 1 commit into
mainfrom
codex/plan-seed-done

Conversation

@shaypal5

Copy link
Copy Markdown
Member

Docs-only .agent-plan.md update to mainline truth.

The state repo DataHackIL/tfht_enforce_idx_state has been seeded from local data/news_items as the discovery/ingest source of truth:

  • Recovered the rich candidate state from orphaned .jsonl.gz (left by the reverted gzip experiment perf(state): gzip the big rewrite-per-run discovery state files #208) back to plain JSONL: 27,568 candidates + retry/backfill queues + scrape attempts + domain verdicts + search budget + query yield + backfill_batches + runs + metrics.
  • Excluded (not state SoT): prefilter ML models + decision-log telemetry (~183 MB), the regenerable engine_query_cache, and candidate_provenance.jsonl (119 MB — over GitHub's 100 MB per-file limit; rebuilds going forward).
  • Packed ~30 MB; latest_candidates.jsonl (62 MB) and backfill_queue.jsonl (52 MB) are over GitHub's 50 MB soft-warning size.

Go-live (re-enabling scheduled workflows) is deferred by operator choice until a manual dispatch verifies the seeded state end to end. The parked scrape→classify decouple stays tracked as #213.

🤖 Generated with Claude Code

…ing step

The state repo DataHackIL/tfht_enforce_idx_state has been seeded from local
data/news_items: the rich candidate state (27,568 candidates + queues +
attempts + verdicts + budget + yield + backfill_batches + runs + metrics),
recovered from orphaned .jsonl.gz (left by the reverted gzip experiment) back
to plain JSONL. Excluded from the seed: prefilter ML models + decision-log
telemetry, the regenerable engine_query_cache, and candidate_provenance.jsonl
(119 MB, over GitHub's 100 MB per-file limit; rebuilds going forward).

Re-enabling the scheduled workflows (go-live) is deferred by operator choice
until a manual dispatch verifies the seeded state end to end. Updates
.agent-plan.md to mainline truth.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Copilot AI review requested due to automatic review settings June 14, 2026 09:15
@shaypal5 shaypal5 added this to the Local↔CI Unification milestone Jun 14, 2026
@shaypal5 shaypal5 added the discovery Discovery-layer and candidate-retention work label Jun 14, 2026
@shaypal5 shaypal5 merged commit 94026bd into main Jun 14, 2026
1 check was pending
@shaypal5 shaypal5 deleted the codex/plan-seed-done branch June 14, 2026 09:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates .agent-plan.md to reflect current mainline truth now that the external state repository has been seeded, and clarifies that the remaining go-live work is to re-enable non-scraping scheduled workflows after manual verification.

Changes:

  • Records that DataHackIL/tfht_enforce_idx_state has been seeded from local data/news_items, including what was included/excluded and notable file-size constraints.
  • Refines UNIFY-PR-06 to explicitly frame “go-live” as the remaining step (re-enabling only non-scraping scheduled workflows) and notes it is intentionally deferred pending a manual dispatch verification.

@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.82%. Comparing base (0f41a85) to head (bd1950d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #215   +/-   ##
=======================================
  Coverage   92.82%   92.82%           
=======================================
  Files          83       83           
  Lines       12302    12302           
=======================================
  Hits        11419    11419           
  Misses        883      883           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes a failing check on PR #215.

Diagnose and fix the failing checks below, then push all of these changes in a single commit.

# Failing Checks

## FAIL-1
Type: Commit status
Context: pre-commit.ci - pr
Status: error
URL: https://results.pre-commit.ci/run/github/1159993403/1781428560.jSuy65FjQrGsHR0pDh1Lig

Summary:
    error during mergeable check

Run metadata:

Tool ref: v4.0.19
Tool version: 4.0.19
Trigger: pull request opened
Workflow run: 27494347838 attempt 1
Comment timestamp: 2026-06-14T09:19:10.826062+00:00
PR head commit: bd1950d7bd6ee2d2f2c2dbb3bff04d476d629e7c

shaypal5 added a commit that referenced this pull request Jun 14, 2026
…edes #215) (#219)

#215 recorded the state-repo seed as a clean success. It wasn't: that seed leaked
a live Google CSE API key (captured into a discovery run's errors[] from a CSE-403
URL and pushed to the PUBLIC state repo). This corrects .agent-plan.md to mainline
truth:

- the incident and its remediation (key rotated; public-repo history purged with a
  clean root; redaction root-cause fix #217 merged; re-seeded state is key-scrubbed);
- go-live (UNIFY-PR-06) is now gated on the state-push secret-scan guard (issue #218)
  in addition to the manual-dispatch verification;
- last-merged status points at the #217 redaction fix.

Doc-only; opened as a PR for review rather than merged directly.

Co-authored-by: Claude Opus 4.8 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discovery Discovery-layer and candidate-retention work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants