diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md index e4b63774b..dbf97d4d3 100644 --- a/docs/docs/usage/dbt-pr-review.md +++ b/docs/docs/usage/dbt-pr-review.md @@ -332,8 +332,11 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp from `target/compiled//…` (HEAD) and `target-base/compiled/…` (BASE), produced by `dbt compile`. To enable full equivalence verdicts, compile both the base and head refs in CI (the base into `target-base/`, the Recce - convention). Without compiled SQL the engine lanes fall back to raw and stay - *undecidable* (never fabricated) — the `dbt-patterns` lane still runs. + convention). For proof-grade equivalence, also produce `target/catalog.json` + with `dbt docs generate`; the catalog carries complete warehouse columns that + `manifest.json` often lacks. Without compiled SQL or complete columns the + engine lanes stay *undecidable* (never fabricated) — the `dbt-patterns` lane + still runs. ## What it checks — deterministic rule catalog diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md new file mode 100644 index 000000000..841ae5441 --- /dev/null +++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md @@ -0,0 +1,338 @@ +# dbt PR Review Self-Improvement Loop + +Date: 2026-06-08 + +## Goal + +Make `altimate review` the highest-signal dbt PR reviewer by adding only checks +that survive an end-to-end value test: + +1. The finding is backed by deterministic evidence, warehouse execution, or a + clearly bounded advisory rule. +2. It catches a real dbt PR risk in the public demo or sourced benchmark corpus. +3. It does not add false positives to corrected/safe cases. +4. It produces a review comment that a data engineer would act on. +5. It degrades loudly when required artifacts, schema, or warehouse access are + missing. + +## Recommended `/goal` Command + +```text +/goal Objective: make the dbt PR reviewer demo-parity reliable and measurably +higher-signal by closing the current public-demo gaps while preserving the +existing benchmark floor. + +Quantified acceptance target: +- Public demo matrix reaches expected verdicts for all 6 demo branches: + safe-refactor=APPROVE, join-key-breakage=REQUEST_CHANGES, + test-removal=COMMENT, new-pii-exposure=REQUEST_CHANGES, + mart-select-star=COMMENT, incremental-without-guard=COMMENT. +- Real-world corpus remains at 15/15 caught bad cases and 0/5 false positives + on corrected cases. +- At least one warehouse-backed DuckDB e2e proves a real dbt build/test pass + that the reviewer catches pre-merge. +- Every blocking finding is produced by deterministic core evidence or + warehouse-observed impact, not advisory AI or heuristic equivalence. + +Operating rules: +- Always start by pulling latest main of + /Users/anandgupta/codebase/altimate-core-internal with fast-forward only. +- Prefer implementing deterministic SQL/dbt analysis in altimate-core, not + altimate-code. Avoid regex/custom parsing in altimate-code when core + AST/parser support can own it. +- altimate-code should mostly orchestrate, map core findings to review + categories/severity, format verdicts, and run GitHub/CI integration. +- For every candidate improvement: + 1. Write a short spec: risk, deterministic evidence, what must not be flagged. + 2. Add bad and safe/corrected fixtures. + 3. Implement in altimate-core when possible. + 4. Build local core Node package and link altimate-code to it only when e2e + needs the local native package. + 5. Validate with the narrowest useful tests first, then corpus/demo/warehouse + only at acceptance checkpoints. + 6. Keep only if it improves real review value with no new false positives; + otherwise revert/reject or make advisory-only. +- Use /Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo for demo + branch e2e. +- Compile base dbt artifacts into target-base and head artifacts into target + before review when structural/equivalence lanes need base-vs-head compiled SQL. +- Use local DuckDB first; use BigQuery where warehouse behavior matters and + credentials are available. Trino/Postgres/ClickHouse/DuckDB can be spawned + locally. Snowflake later. +- Treat altimate-core equivalence as heuristic unless there is a sound proof + path. Do not block or approve solely from heuristic equivalence. Block only on + reproducible deterministic facts or warehouse-observed impact. +- Track results in + docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md. + +Iteration speed: +- Optimize for fast learning loops. Do not run full test suites or expensive + builds by default. +- Start with the narrowest relevant unit test or direct function smoke test. +- Only run broader tests after a candidate passes the focused check. +- Avoid rebuilding altimate-core unless the change affects native exports, NAPI + bindings, or e2e verification requires the local Node package. +- Prefer `cargo test -p altimate-core ` over full cargo + test. +- Prefer `bun test test/altimate/.ts` over full package + tests. +- Use direct Node smoke tests against `crates/altimate-core-node` before + relinking altimate-code. +- Run the real-world corpus and demo matrix only at acceptance checkpoints, not + after every edit. +- Run warehouse e2e only when the value claim depends on observed data + behavior. +- Keep a short validation ladder for each candidate: smoke -> focused unit -> + focused integration -> corpus/demo -> warehouse e2e if needed. + +Initial backlog: +1. Remove safe-refactor noise: base compiled artifact support and/or equivalence + degradation handling. +2. Tighten PII precision: do not flag low-confidence Name such as + customer_name, but still flag email/SSN/phone/payment identifiers. +3. Reduce low-value missing_table_alias comments for dbt CTE-heavy SQL. +4. Expand core structural diff rules for high-signal dbt semantic regressions. +5. Add warehouse data-diff e2e cases that prove real row/value impact. +``` + +## Skill vs Goal + +Create a skill once this loop stabilizes across several iterations. A skill is +the right home for reusable operating instructions: how to pull core, choose +core-vs-code ownership, build local NAPI, run the demo matrix, and apply the +validation ladder. The active `/goal` should still include the quantified +objective above because it provides the stop condition and prevents open-ended +"make it better" work. + +## Baseline From Dry Run + +Commands run from `packages/opencode` unless noted: + +```bash +bun run --conditions=browser script/review-realworld-eval.ts +``` + +Result: + +- Sourced bad-case catch rate: 15/15. +- Corrected-case false positives: 0/5. + +Demo matrix run against `demo/dbt-pr-review-demo` branches with the local dbt +virtualenv and `altimate review --mode gate --json`: + +| Branch | Expected | Actual | Finding categories | +|---|---:|---:|---| +| `demo/safe-refactor` | `APPROVE` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality` | +| `demo/join-key-breakage` | `REQUEST_CHANGES` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_correctness`, `sql_quality` | +| `demo/test-removal` | `COMMENT` | `COMMENT` | `test_coverage` | +| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `pii_exposure`, `semantic_change`, `sql_quality` | +| `demo/mart-select-star` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality`, `test_coverage` | +| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `join_risk`, `warehouse_cost`, `sql_quality`, `materialization`, `sql_correctness`, `test_coverage` | + +Observed problems: + +- `safe-refactor` is noisy. The semantic lane cannot decide equivalence because + base compiled SQL is unavailable, `Name` classification flags `customer_name` + as PII, and `missing_table_alias` flags harmless CTE aliasing. +- `join-key-breakage` is not blocked. The useful join-key mismatch appears from + the advisory AI lane, not the deterministic lane, so it cannot gate. +- Several demo branches are degraded because base-vs-head compiled artifacts are + incomplete. A safe review must not pretend heuristic equivalence is a proof. + +## Loop Contract + +For each candidate improvement: + +1. **Spec**: one paragraph stating the risk, the deterministic evidence, and + what must not be flagged. +2. **Fixtures**: at least one bad case and one corrected/safe countercase. +3. **Implementation**: smallest scoped change in the reviewer or demo workflow. +4. **Validation**: + - Unit or harness test for the detector. + - `bun run --conditions=browser script/review-realworld-eval.ts`. + - Demo branch matrix for affected branches. + - Warehouse-backed run when the value claim depends on actual data impact. +5. **Decision**: + - Keep only if it improves catch rate or demo parity with zero new false + positives. + - Reject or keep advisory-only if evidence is heuristic, schema-dependent, + or too broad. + +## First Candidate Backlog + +### 1. Deterministic Join-Key Regression Detector + +Status: pushed down to `altimate-core-internal` as structural diff rule +`SC010 join_key_regression` in +`crates/altimate-core/src/review/structural_diff.rs`. `altimate-code` only maps +the core finding into the review verdict. + +Risk: a PR changes a join predicate from matching the same business key to +joining unrelated identifiers, e.g. `orders.customer_id = customers.customer_id` +becomes `orders.order_id = customers.customer_id`. + +Evidence: + +- Base and head SQL are both available. +- A changed `JOIN ... ON` equality compares identifier columns ending in `_id`. +- The base predicate joined same-name/same-stem keys, while the head predicate + joins different stems. + +Do not flag: + +- Intentional bridge joins such as `orders.order_id = order_items.order_id`. +- Joins where the changed key stems still match. +- Cases without a base predicate to compare. + +Expected result: + +- `demo/join-key-breakage` gets a deterministic `join_risk` or + `sql_correctness` critical finding and blocks in gate mode. +- `demo/safe-refactor` remains quiet for this detector. + +Validation: + +- `git pull --ff-only origin main` completed in + `/Users/anandgupta/codebase/altimate-core-internal`; latest core main was + `a413804` before this local rule change. +- `cargo test -p altimate-core review::structural_diff --lib` passed with the + new `SC010` bad case and safe countercases. +- `bun test --timeout 30000 test/altimate/review.test.ts` passed. +- `bun run --conditions=browser script/review-realworld-eval.ts` stayed at + 15/15 sourced catches and 0/5 corrected false positives. +- Demo matrix: `demo/join-key-breakage` improved from `COMMENT` to + `REQUEST_CHANGES` with deterministic `join_risk`. +- The detector itself stayed quiet on `demo/safe-refactor` and on the + same-key bridge-join countercase. + +### 2. Base Compiled Artifact Support In Demo CI + +Risk: safe refactors degrade to "could not prove equivalent" because the review +has incomplete proof inputs: missing base compiled SQL or incomplete schema +columns. + +Evidence: + +- `compiled.ts` already supports `target-base/compiled//`. +- The demo workflow currently runs only head-side `dbt compile`. +- `demo/safe-refactor` became decidable after `dbt docs generate` produced + `target/catalog.json`, but core equivalence then falsely compared CTE alias + names inside the join filter. + +Do not flag: + +- A safe CTE rename where base and head compiled SQL are equivalent or where no + deterministic structural difference exists. + +Expected result: + +- The demo workflow compiles base into `target-base` before compiling head. +- The demo workflow produces `target/catalog.json` when a warehouse-backed dbt + project is available, so equivalence has complete columns. +- `demo/safe-refactor` emits no findings with local core and DuckDB catalog + artifacts. + +Implemented result: + +- Core `L012 missing_table_alias` now ignores CTE references, removing the + `order_records/customer_records` alias noise while preserving real table alias + lint. +- Core equivalence now resolves join predicate columns through CTE aliases to + base `table.column` provenance, so CTE alias renames are equivalent but a + changed join key remains material. +- `altimate-code` now threads the detected dialect into + `altimate_core.equivalence` and `--no-ai` correctly disables the advisory lane. +- Local DuckDB safe-refactor validation: `APPROVE`, zero findings. +- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false + positives. + +### 3. PII Classification Precision Floor + +Risk: high-noise PII comments on ordinary names such as `customer_name` reduce +trust and make safe PRs look risky. + +Evidence: + +- `classify_pii` labels `customer_name` as `Name` at 75% confidence in the demo. +- The stronger demo/security value is email/SSN/phone/payment identifiers and + source-propagated sensitive columns. + +Do not flag: + +- `Name` classification below a high confidence threshold. +- Pre-existing PII columns not introduced by the PR. + +Expected result: + +- `demo/safe-refactor` no longer reports `customer_name`. +- `demo/new-pii-exposure` still reports `email`. + +Implemented result: + +- Diff-scoped core PII classification is the authoritative PR-review PII + comment when lineage/classification are available. +- High-confidence non-low-risk PII introduced into `marts/` or `reporting/` + is critical and blockable from `altimate_core.classify_pii` evidence. +- Low-risk `Name`/`Address` classifications require at least 90% confidence to + surface, so `first_name`/`customer_name`-style weak signals do not create + noisy comments. +- Fallback `dbt-patterns`/`rule-catalog` PII twins are suppressed for files + where the diff-scoped core classifier ran; fallback remains available when + core lineage/classification is unavailable. +- `demo/new-pii-exposure` now reports one critical core PII finding for + `email` plus the core equivalence warning, and still returns + `REQUEST_CHANGES`. +- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false + positives. + +## Current Acceptance Checkpoint + +Demo matrix with local core, base artifacts in `target-base`, head artifacts in +`target`, and AI disabled: + +| Branch | Expected | Actual | Deterministic evidence | +|---|---:|---:|---| +| `demo/safe-refactor` | `APPROVE` | `APPROVE` | no findings | +| `demo/join-key-breakage` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.structural_diff:SC010`, core equivalence | +| `demo/test-removal` | `COMMENT` | `COMMENT` | `dbt-patterns:removed_tests` | +| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.classify_pii`, core equivalence | +| `demo/mart-select-star` | `COMMENT` | `COMMENT` | core equivalence warning | +| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | core dbt config lint + deterministic catalog cost notes | + +DuckDB e2e proof: + +- Branch: `demo/join-key-breakage`. +- `dbt build --profiles-dir . --target dev`: passed `PASS=14 WARN=0 ERROR=0`. +- Reviewer: `REQUEST_CHANGES` with critical `join_risk` from + `altimate_core.structural_diff` rule `join_key_regression` (`SC010`). + +Warehouse data-diff e2e proof: + +- Test: `packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts`. +- Connection: local DuckDB via `@altimateai/drivers/duckdb`. +- Base/head relations: `base_orders` and `head_orders`. +- Key columns: `order_id`; compared value column: `amount`. +- Observed warehouse delta: one row only in head and one updated value. +- Validation: + `ALTIMATE_RUN_WAREHOUSE_E2E=1 bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`. +- Default fast-loop behavior: + `bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts` + skips unless `ALTIMATE_RUN_WAREHOUSE_E2E=1` is set. +- Implementation note: fixed the DuckDB connector to use the two-argument + constructor when no open options are required; Bun's native binding can miss + the callback when `undefined` is passed as the second argument. + +## Warehouse E2E Policy + +Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a +candidate needs production-style warehouse behavior. Trino, Postgres, +ClickHouse, and DuckDB can be spawned locally. Snowflake is deferred until a +separate account is available. + +Warehouse-backed findings must include: + +- The connection used. +- Base/head SQL or model relation names. +- Key columns. +- Row/value delta summary. +- A skip/degraded state when credentials or drivers are unavailable. diff --git a/docs/internal/dbt-pr-review-demo-scenario-corpus.md b/docs/internal/dbt-pr-review-demo-scenario-corpus.md new file mode 100644 index 000000000..8abc509a6 --- /dev/null +++ b/docs/internal/dbt-pr-review-demo-scenario-corpus.md @@ -0,0 +1,217 @@ +# dbt PR Review Demo Scenario Corpus + +Status: active corpus build, started 2026-06-08. + +Goal: build 50 customer-demo-ready dbt PR scenarios in +`/Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo`. Each +scenario should be a small PR branch that either demonstrates a deterministic +reviewer catch or proves the reviewer correctly stays quiet for a safe change. + +## Acceptance Contract + +- 50 runnable demo PR branches. +- Every scenario has a unique id, title, risk category, expected verdict, + deterministic evidence source, validation command, and customer demo script. +- No materially duplicate scenarios. +- DuckDB is the default warehouse. +- Final 50-branch matrix reaches the expected verdict for every implemented + branch. +- Real-world corpus remains 15/15 caught bad cases and 0/5 false positives. +- Blocking findings come from `altimate-core` deterministic evidence or + warehouse-observed impact, not advisory AI. +- Scenarios that need missing deterministic support are deferred rather than + forced into noisy demos. + +## Metadata Schema + +The catalog below is intentionally table-shaped so it can be copied into YAML or +JSON once the pilot branch runner is added. + +Required fields: + +| Field | Meaning | +|---|---| +| `id` | Stable scenario id, `sNNN`. | +| `branch` | Demo branch name, `demo/-` for new branches. Existing branches keep their current names. | +| `status` | `existing`, `pilot`, `planned`, or `deferred-core-needed`. | +| `title` | Customer-facing title. | +| `category` | Reviewer category or taxonomy bucket. | +| `expected` | Expected verdict: `APPROVE`, `COMMENT`, or `REQUEST_CHANGES`. | +| `evidence` | Deterministic evidence source expected in the finding. | +| `artifact_needs` | `manifest`, `catalog`, `target-base`, `warehouse`, or combinations. | +| `validation` | Focused command or matrix selector. | +| `demo_script` | Short customer-facing explanation. | + +Default focused validation command: + +```bash +ALTIMATE_LOCAL_CORE=/Users/anandgupta/codebase/altimate-core-internal/crates/altimate-core-node \ + bun --conditions=browser /Users/anandgupta/codebase/altimate-code/packages/opencode/src/index.ts review \ + --cwd /Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo \ + --base=main \ + --head=HEAD \ + --mode=gate \ + --manifest=/Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo/target/manifest.json \ + --json \ + --no-ai +``` + +Base artifacts should be generated from `main` into `target-base`; head +artifacts should be generated on the scenario branch into `target`. + +## Implemented 50-Scenario Corpus + +Run every branch from `/Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo` +with DuckDB, `dbt build`, `dbt compile`, `dbt docs generate`, and the default +focused review command above. AI must remain disabled for demo acceptance. + +| id | branch | status | title | category | expected | deterministic evidence | artifact_needs | demo_script | +|---|---|---|---|---|---|---|---|---| +| s001 | `demo/safe-refactor` | implemented | Safe CTE refactor is approved | safe_refactor | APPROVE | no findings | manifest, catalog, target-base | Proves the reviewer stays quiet for harmless SQL cleanup. | +| s002 | `demo/join-key-breakage` | implemented | Wrong join key changes attribution | join_risk | REQUEST_CHANGES | `altimate_core.structural_diff:join_key_regression` | manifest, catalog, target-base | Catches a silent business-logic bug that still compiles. | +| s003 | `demo/test-removal` | implemented | Primary-key tests removed | test_coverage | COMMENT | `dbt-patterns:removed_tests` | manifest | Shows guardrail removal before duplicate rows ship. | +| s004 | `demo/new-pii-exposure` | implemented | Customer email exposed in a mart | pii_exposure | REQUEST_CHANGES | `altimate_core.classify_pii` | manifest, catalog, target-base | Blocks sensitive data propagation before merge. | +| s005 | `demo/mart-select-star` | implemented | SELECT star in a published mart | semantic_change | COMMENT | `altimate_core.equivalence` | manifest, catalog, target-base | Warns that a projection change alters the mart contract. | +| s006 | `demo/incremental-without-guard` | implemented | Unsafe incremental conversion | materialization | COMMENT | `altimate_core.dbt_config`, deterministic cost rules | manifest | Flags a common incremental-model footgun. | +| s007 | `demo/s007-type-narrowing-amount` | implemented | Amount type narrowed | contract_violation | COMMENT | `altimate_core.structural_diff:type_narrowing` | manifest, target-base | Shows truncation/overflow risk before downstream breakage. | +| s008 | `demo/s008-join-on-cast` | implemented | Join key wrapped in CAST | join_risk | COMMENT | `altimate_core.check:cast_in_join_key` | manifest | Shows optimizer and key-matching risk from casting join keys. | +| s009 | `demo/s009-join-or-condition` | implemented | OR added to join condition | join_risk | COMMENT | `altimate_core.check:or_in_join` | manifest | Shows explosive join risk from a broad OR predicate. | +| s010 | `demo/s010-test-disabled` | implemented | Uniqueness test disabled | test_coverage | COMMENT | `dbt-patterns:removed_tests`, `rule-catalog:test-disabled` | manifest | Shows a uniqueness guard being weakened in YAML. | +| s011 | `demo/s011-left-join-to-inner` | implemented | LEFT JOIN changed to INNER JOIN | semantic_change | COMMENT | `altimate_core.structural_diff:join_type_change` | manifest, target-base | Highlights row-loss risk from changing join optionality. | +| s012 | `demo/s012-join-using-ambiguity` | implemented | JOIN USING hides merged key behavior | join_risk | COMMENT | `rule-catalog:using-join` | manifest | Shows ambiguity from merged join columns. | +| s013 | `demo/s013-right-join-readability` | implemented | RIGHT JOIN introduced | sql_quality | COMMENT | `rule-catalog:right-join` | manifest | Shows a low-severity maintainability risk. | +| s014 | `demo/s014-cross-join-filtered` | implemented | Filtered CROSS JOIN introduced | join_risk | REQUEST_CHANGES | `dbt-patterns:cross_join` | manifest | Blocks a high-confidence cartesian-product risk. | +| s015 | `demo/s015-left-join-filter` | implemented | LEFT JOIN filtered in WHERE | join_risk | COMMENT | `dbt-patterns:outer_join_filter_in_where` | manifest, target-base | Shows a WHERE clause that silently turns optional rows into required rows. | +| s016 | `demo/s016-distinct-added` | implemented | DISTINCT added to hide duplicates | semantic_change | COMMENT | `altimate_core.structural_diff:distinct_added`, `altimate_core.check:select_distinct_smell` | manifest, target-base | Shows dedup masking rather than fixing upstream grain. | +| s017 | `demo/s017-limit-added` | implemented | LIMIT added to a mart | sql_correctness | REQUEST_CHANGES | `dbt-patterns:limit-in-model`, `altimate_core.structural_diff:limit_added` | manifest, target-base | Blocks accidental sampling in production logic. | +| s018 | `demo/s018-clock-column` | implemented | Runtime clock added to mart output | idempotency | COMMENT | `rule-catalog:timezone-naive-now` | manifest | Shows non-reproducible output from run-time clock functions. | +| s019 | `demo/s019-lateral-on-true` | implemented | LATERAL join with ON true | join_risk | COMMENT | `rule-catalog:lateral-join`, `altimate_core.check:join_without_condition` | manifest, target-base | Shows per-row lateral execution and an unbounded join condition. | +| s020 | `demo/s020-not-in-subquery` | implemented | NOT IN subquery added | sql_correctness | COMMENT | `altimate_core.check:not_in_nullable`, `rule-catalog:not-exists-suggested` | manifest | Shows the NULL-sensitive `NOT IN` failure mode. | +| s021 | `demo/s021-union-dedup-cost` | implemented | UNION dedup introduced | warehouse_cost | COMMENT | `rule-catalog:union-not-all`, `altimate_core.check:union_without_all` | manifest, target-base | Shows hidden sort/dedup cost in mart logic. | +| s022 | `demo/s022-in-subquery-large` | implemented | IN subquery introduced | warehouse_cost | COMMENT | `rule-catalog:in-subquery-large` | manifest | Shows an optimizer-unfriendly semi-join pattern. | +| s023 | `demo/s023-full-outer-join-filtered` | implemented | Filtered FULL OUTER JOIN introduced | join_risk | COMMENT | `altimate_core.check:full_outer_join`, `altimate_core.structural_diff:join_type_change` | manifest, target-base | Shows null-side handling risk from full outer joins. | +| s024 | `demo/s024-weak-pii-hash` | implemented | Weak MD5 hash of email exposed | pii_exposure | COMMENT | `rule-catalog:weak-pii-hash` | manifest, target-base | Shows why simple hashes are not anonymization. | +| s025 | `demo/s025-hardcoded-secret` | implemented | Hardcoded API key-like secret in SQL | pii_exposure | REQUEST_CHANGES | `rule-catalog:hardcoded-credential` | manifest, target-base | Blocks secret leakage into SQL and compiled artifacts. | +| s026 | `demo/s026-new-phone-pii` | implemented | Phone number exposed in mart | pii_exposure | REQUEST_CHANGES | `altimate_core.classify_pii`, lineage impact | manifest, catalog, target-base | Shows sensitive contact-data propagation from source to mart. | +| s027 | `demo/s027-new-ssn-pii` | implemented | SSN exposed in mart | pii_exposure | REQUEST_CHANGES | `altimate_core.classify_pii`, lineage impact | manifest, catalog, target-base | Blocks high-risk identifier exposure. | +| s028 | `demo/s028-test-severity-warn` | implemented | Test downgraded to severity warn | test_coverage | COMMENT | `rule-catalog:test-severity-warn`, `dbt-patterns:removed_tests` | manifest | Shows CI guardrails being weakened. | +| s029 | `demo/s029-description-removed` | implemented | Model description removed | governance | COMMENT | `rule-catalog:yml-description-removed` | manifest | Shows documentation drift for data consumers. | +| s030 | `demo/s030-yml-column-removed` | implemented | Column removed from schema.yml metadata | contract_violation | COMMENT | `rule-catalog:yml-column-removed` | manifest | Shows schema metadata drift while SQL still builds. | +| s031 | `demo/s031-count-distinct-cost` | implemented | Exact COUNT DISTINCT added | warehouse_cost | COMMENT | `rule-catalog:count-distinct-large`, `rule-catalog:bq-approx-good` | manifest | Shows expensive exact distinct counting. | +| s032 | `demo/s032-order-by-in-cte` | implemented | ORDER BY added without limiting rows | warehouse_cost | COMMENT | `dbt-patterns:order-by-no-limit` | manifest | Shows a sort that adds cost without a stable consumer contract. | +| s033 | `demo/s033-function-filter-column` | implemented | Function applied to filter column | warehouse_cost | COMMENT | `rule-catalog:function-filter-column`, `altimate_core.check:non_sargable_predicate` | manifest | Shows partition/index pruning defeated by a wrapper function. | +| s034 | `demo/s034-implicit-comma-join` | implemented | Comma join introduced | join_risk | COMMENT | `rule-catalog:implicit-cross-join-comma`, `altimate_core.check:implicit_cross_join` | manifest | Shows an accidental cross join from old-style SQL. | +| s035 | `demo/s035-natural-join` | implemented | NATURAL JOIN introduced | join_risk | REQUEST_CHANGES | `dbt-patterns:natural-join` | manifest, target-base | Blocks fragile joins that change when either side adds a same-named column. | +| s036 | `demo/s036-case-without-else` | implemented | CASE expression lacks ELSE | sql_correctness | COMMENT | `dbt-patterns:case-no-else` | manifest | Shows silent NULLs from unmatched branches. | +| s037 | `demo/s037-division-no-guard` | implemented | Division by column has no guard | sql_correctness | COMMENT | `altimate_core.check:division_by_column_no_guard` | manifest | Shows divide-by-zero/null risk before runtime failures. | +| s038 | `demo/s038-equals-null-filter` | implemented | Filter compares value to NULL with equals | sql_correctness | COMMENT | `altimate_core.check:not_null_comparison` | manifest | Shows SQL that looks valid but never matches NULLs. | +| s039 | `demo/s039-leading-wildcard-like` | implemented | Leading wildcard LIKE added | warehouse_cost | COMMENT | `altimate_core.check:like_leading_wildcard`, `altimate_core.check:case_sensitive_like` | manifest | Shows non-sargable pattern matching on a joined column. | +| s040 | `demo/s040-between-date-boundary` | implemented | BETWEEN timestamp boundary added | sql_correctness | COMMENT | `dbt-patterns:between-timestamp`, `dbt-patterns:hardcoded-date` | manifest | Shows inclusive timestamp boundary and stale hardcoded-date risk. | +| s041 | `demo/s041-order-by-no-limit` | implemented | Top-level ORDER BY without LIMIT | warehouse_cost | COMMENT | `dbt-patterns:order-by-no-limit` | manifest | Shows a production sort that does not guarantee useful output order. | +| s042 | `demo/s042-offset-no-order` | implemented | OFFSET without ORDER BY | sql_correctness | COMMENT | `dbt-patterns:offset-no-order` | manifest, target-base | Shows nondeterministic pagination in a model. | +| s043 | `demo/s043-random-column` | implemented | Random column added to mart | idempotency | COMMENT | `dbt-patterns:random-nondeterminism` | manifest, target-base | Shows non-reproducible data across reruns and backfills. | +| s044 | `demo/s044-full-refresh-config` | implemented | full_refresh=true added | materialization | COMMENT | `dbt-patterns:full-refresh-true` | manifest | Shows a config that forces repeated full rebuilds. | +| s045 | `demo/s045-hardcoded-date-filter` | implemented | Hardcoded date filter drops history | freshness | COMMENT | `dbt-patterns:hardcoded-date` | manifest, target-base | Shows a time filter that will not roll forward. | +| s046 | `demo/s046-regexp-filter` | implemented | Regex filter added | warehouse_cost | COMMENT | `rule-catalog:regexp-heavy` | manifest, target-base | Shows expensive/portable-risk regex filtering. | +| s047 | `demo/s047-cast-as-text` | implemented | Cast to non-portable TEXT type | sql_quality | COMMENT | `rule-catalog:cast-as-text` | manifest, target-base | Shows a cross-warehouse portability issue that still runs on DuckDB. | +| s048 | `demo/s048-interval-literal` | implemented | String interval literal added | sql_quality | COMMENT | `rule-catalog:interval-string-literal` | manifest, target-base | Shows date arithmetic syntax that differs by warehouse. | +| s049 | `demo/s049-date-plus-integer` | implemented | Date plus bare integer added | sql_correctness | COMMENT | `rule-catalog:date-plus-integer` | manifest, target-base | Shows ambiguous, non-portable date arithmetic. | +| s050 | `demo/s050-order-by-random` | implemented | ORDER BY RANDOM added | idempotency | COMMENT | `rule-catalog:order-by-random`, `dbt-patterns:random-nondeterminism` | manifest | Shows nondeterministic and costly random ordering. | + +## Deferred Capability List + +These are still useful demo candidates, but should wait for deterministic support: + +- Window partition/order structural diffs in `altimate-core`. +- Owner/meta YAML removal via structured dbt metadata parsing. +- Warehouse-observed row/value data-diff demos that compare base/head relations + after the review command has a stable scripted harness. +- Scalar-subquery SELECT detection in core AST analysis. The line-oriented + catalog rule is too formatting-sensitive for a flagship scenario. +- Fanout-after-aggregate structural analysis in core. The current text detector + does not cover enough real dbt formatting shapes. + +## Validation Matrix Status + +| checkpoint | branches | matrix status | notes | +|---|---:|---|---| +| pilot | 10 | passed | `s001`-`s010` reached expected verdicts with fresh DuckDB state per branch. | +| tranche 2 | 10 | passed after replacements | `s015` replaced with `left-join-filter`; `s019` replaced with `lateral-on-true`. | +| tranche 3 | 10 | passed | `s021`-`s030` reached expected verdicts after schema YAML catalog wiring. | +| tranche 4 | 10 | passed | `s031`-`s040` reached expected verdicts; `s035` strengthened to `REQUEST_CHANGES`. | +| tranche 5 | 10 | passed | `s041`-`s050` reached expected verdicts. | +| full corpus | 50 | passed after classifier fix | Full run built all branches. `s001` was revalidated to APPROVE after non-dbt YAML stopped being classified as schema.yml. | + +## Final Matrix Result + +Run date: 2026-06-08. + +Validation setup: + +- Each branch ran with a fresh `demo.duckdb` file. +- Each branch ran `dbt build`, `dbt compile`, and `dbt docs generate`. +- Reviewer ran through local `altimate-code`; AI was disabled with `--no-ai`. +- A false positive found during the first full run caused `.github/workflows/dbt-pr-review.yml` + to be reviewed as dbt schema YAML. `classifyDbtFile` now limits schema YAML + classification to dbt resource paths and conventional property filenames. + +| id | branch | expected | actual | findings | deterministic evidence | +|---|---|---:|---:|---:|---| +| s001 | `demo/safe-refactor` | APPROVE | APPROVE | 0 | no findings | +| s002 | `demo/join-key-breakage` | REQUEST_CHANGES | REQUEST_CHANGES | 3 | `altimate_core.structural_diff:join_key_regression`, core equivalence | +| s003 | `demo/test-removal` | COMMENT | COMMENT | 1 | `dbt-patterns:removed_tests` | +| s004 | `demo/new-pii-exposure` | REQUEST_CHANGES | REQUEST_CHANGES | 2 | `altimate_core.classify_pii`, core equivalence | +| s005 | `demo/mart-select-star` | COMMENT | COMMENT | 1 | core equivalence | +| s006 | `demo/incremental-without-guard` | COMMENT | COMMENT | 4 | `altimate_core.dbt_config`, deterministic cost rules | +| s007 | `demo/s007-type-narrowing-amount` | COMMENT | COMMENT | 3 | `altimate_core.structural_diff:type_narrowing`, core equivalence | +| s008 | `demo/s008-join-on-cast` | COMMENT | COMMENT | 4 | `altimate_core.check:cast_in_join_key`, core equivalence | +| s009 | `demo/s009-join-or-condition` | COMMENT | COMMENT | 3 | `altimate_core.check:or_in_join`, core equivalence | +| s010 | `demo/s010-test-disabled` | COMMENT | COMMENT | 2 | `dbt-patterns:removed_tests`, `rule-catalog:test-disabled` | +| s011 | `demo/s011-left-join-to-inner` | COMMENT | COMMENT | 2 | core equivalence, `altimate_core.structural_diff:join_type_change` | +| s012 | `demo/s012-join-using-ambiguity` | COMMENT | COMMENT | 3 | `rule-catalog:using-join`, core equivalence | +| s013 | `demo/s013-right-join-readability` | COMMENT | COMMENT | 3 | `rule-catalog:right-join`, `altimate_core.structural_diff:join_type_change` | +| s014 | `demo/s014-cross-join-filtered` | REQUEST_CHANGES | REQUEST_CHANGES | 2 | `dbt-patterns:cross_join`, core equivalence | +| s015 | `demo/s015-left-join-filter` | COMMENT | COMMENT | 2 | `dbt-patterns:outer_join_filter_in_where`, core equivalence | +| s016 | `demo/s016-distinct-added` | COMMENT | COMMENT | 3 | `altimate_core.structural_diff:distinct_added`, `altimate_core.check:select_distinct_smell` | +| s017 | `demo/s017-limit-added` | REQUEST_CHANGES | REQUEST_CHANGES | 4 | `dbt-patterns:limit-in-model`, `altimate_core.structural_diff:limit_added`, `rule-catalog:limit-no-order` | +| s018 | `demo/s018-clock-column` | COMMENT | COMMENT | 2 | `rule-catalog:timezone-naive-now`, core equivalence | +| s019 | `demo/s019-lateral-on-true` | COMMENT | COMMENT | 4 | `rule-catalog:lateral-join`, `altimate_core.check:join_without_condition`, core equivalence | +| s020 | `demo/s020-not-in-subquery` | COMMENT | COMMENT | 3 | `altimate_core.check:not_in_nullable`, `rule-catalog:not-exists-suggested`, core equivalence | +| s021 | `demo/s021-union-dedup-cost` | COMMENT | COMMENT | 4 | `rule-catalog:union-not-all`, `altimate_core.check:union_without_all`, core equivalence | +| s022 | `demo/s022-in-subquery-large` | COMMENT | COMMENT | 2 | `rule-catalog:in-subquery-large`, core equivalence | +| s023 | `demo/s023-full-outer-join-filtered` | COMMENT | COMMENT | 3 | `altimate_core.check:full_outer_join`, `altimate_core.structural_diff:join_type_change` | +| s024 | `demo/s024-weak-pii-hash` | COMMENT | COMMENT | 3 | `rule-catalog:weak-pii-hash`, core structural/equivalence | +| s025 | `demo/s025-hardcoded-secret` | REQUEST_CHANGES | REQUEST_CHANGES | 2 | `rule-catalog:hardcoded-credential`, core equivalence | +| s026 | `demo/s026-new-phone-pii` | REQUEST_CHANGES | REQUEST_CHANGES | 6 | `altimate_core.classify_pii`, core equivalence, lineage impact | +| s027 | `demo/s027-new-ssn-pii` | REQUEST_CHANGES | REQUEST_CHANGES | 6 | `altimate_core.classify_pii`, core equivalence, lineage impact | +| s028 | `demo/s028-test-severity-warn` | COMMENT | COMMENT | 2 | `rule-catalog:test-severity-warn`, `dbt-patterns:removed_tests` | +| s029 | `demo/s029-description-removed` | COMMENT | COMMENT | 1 | `rule-catalog:yml-description-removed` | +| s030 | `demo/s030-yml-column-removed` | COMMENT | COMMENT | 2 | `rule-catalog:yml-column-removed`, `rule-catalog:yml-description-removed` | +| s031 | `demo/s031-count-distinct-cost` | COMMENT | COMMENT | 3 | `rule-catalog:count-distinct-large`, `rule-catalog:bq-approx-good`, core equivalence | +| s032 | `demo/s032-order-by-in-cte` | COMMENT | COMMENT | 1 | `dbt-patterns:order-by-no-limit` | +| s033 | `demo/s033-function-filter-column` | COMMENT | COMMENT | 3 | `rule-catalog:function-filter-column`, `altimate_core.check:non_sargable_predicate` | +| s034 | `demo/s034-implicit-comma-join` | COMMENT | COMMENT | 3 | `rule-catalog:implicit-cross-join-comma`, `altimate_core.check:implicit_cross_join` | +| s035 | `demo/s035-natural-join` | REQUEST_CHANGES | REQUEST_CHANGES | 3 | `dbt-patterns:natural-join`, core equivalence | +| s036 | `demo/s036-case-without-else` | COMMENT | COMMENT | 2 | `dbt-patterns:case-no-else`, core equivalence | +| s037 | `demo/s037-division-no-guard` | COMMENT | COMMENT | 2 | `altimate_core.check:division_by_column_no_guard`, core equivalence | +| s038 | `demo/s038-equals-null-filter` | COMMENT | COMMENT | 2 | `altimate_core.check:not_null_comparison`, core equivalence | +| s039 | `demo/s039-leading-wildcard-like` | COMMENT | COMMENT | 4 | `altimate_core.check:like_leading_wildcard`, `altimate_core.check:case_sensitive_like` | +| s040 | `demo/s040-between-date-boundary` | COMMENT | COMMENT | 3 | `dbt-patterns:between-timestamp`, `dbt-patterns:hardcoded-date` | +| s041 | `demo/s041-order-by-no-limit` | COMMENT | COMMENT | 1 | `dbt-patterns:order-by-no-limit` | +| s042 | `demo/s042-offset-no-order` | COMMENT | COMMENT | 2 | `dbt-patterns:offset-no-order`, core equivalence | +| s043 | `demo/s043-random-column` | COMMENT | COMMENT | 2 | `dbt-patterns:random-nondeterminism`, core equivalence | +| s044 | `demo/s044-full-refresh-config` | COMMENT | COMMENT | 2 | `dbt-patterns:full-refresh-true`, config-block change | +| s045 | `demo/s045-hardcoded-date-filter` | COMMENT | COMMENT | 2 | `dbt-patterns:hardcoded-date`, core equivalence | +| s046 | `demo/s046-regexp-filter` | COMMENT | COMMENT | 3 | `rule-catalog:regexp-heavy`, core equivalence | +| s047 | `demo/s047-cast-as-text` | COMMENT | COMMENT | 2 | `rule-catalog:cast-as-text`, core equivalence | +| s048 | `demo/s048-interval-literal` | COMMENT | COMMENT | 2 | `rule-catalog:interval-string-literal`, core equivalence | +| s049 | `demo/s049-date-plus-integer` | COMMENT | COMMENT | 2 | `rule-catalog:date-plus-integer`, core equivalence | +| s050 | `demo/s050-order-by-random` | COMMENT | COMMENT | 3 | `rule-catalog:order-by-random`, `dbt-patterns:random-nondeterminism` | + +## Reviewer Fixes From Corpus Work + +- `--no-ai` now maps to `ai=false` in the review CLI. +- Schema YAML catalog rules now run during dbt review, covering metadata/test + weakening rules. +- YAML classification now avoids non-dbt YAML such as GitHub workflows, fixing a + false positive found by `demo/safe-refactor`. diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts index 3ccca467a..d177adc93 100644 --- a/packages/drivers/src/duckdb.ts +++ b/packages/drivers/src/duckdb.ts @@ -56,26 +56,29 @@ export async function connect(config: ConnectionConfig): Promise { new Promise((resolve, reject) => { let resolved = false let timeout: ReturnType | undefined + let instance: any const opts = accessMode ? { access_mode: accessMode } : undefined - const instance = new duckdb.Database( - dbPath, - opts, - (err: Error | null) => { - if (resolved) { if (instance && typeof instance.close === "function") instance.close(); return } - resolved = true - if (timeout) clearTimeout(timeout) - if (err) { - const msg = err.message || String(err) - if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) { - reject(new Error("DUCKDB_LOCKED")) - } else { - reject(err) - } + const onOpen = (err: Error | null) => { + if (resolved) { + if (instance && typeof instance.close === "function") instance.close() + return + } + resolved = true + if (timeout) clearTimeout(timeout) + if (err) { + const msg = err.message || String(err) + if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) { + reject(new Error("DUCKDB_LOCKED")) } else { - resolve(instance) + reject(err) } - }, - ) + } else { + resolve(instance) + } + } + instance = opts + ? new duckdb.Database(dbPath, opts, onOpen) + : new duckdb.Database(dbPath, onOpen) // Bun: native callback may not fire; fall back after 2s timeout = setTimeout(() => { if (!resolved) { diff --git a/packages/opencode/src/altimate/native/altimate-core.ts b/packages/opencode/src/altimate/native/altimate-core.ts index a41296ee7..2d58bad38 100644 --- a/packages/opencode/src/altimate/native/altimate-core.ts +++ b/packages/opencode/src/altimate/native/altimate-core.ts @@ -320,7 +320,7 @@ export function registerAll(): void { register("altimate_core.equivalence", async (params) => { try { const schema = schemaOrEmpty(params.schema_path, params.schema_context) - const raw = await core.checkEquivalence(params.sql1, params.sql2, schema) + const raw = await core.checkEquivalence(params.sql1, params.sql2, schema, params.dialect) const data = toData(raw) return ok(true, data) } catch (e) { diff --git a/packages/opencode/src/altimate/native/types.ts b/packages/opencode/src/altimate/native/types.ts index 7b47f3068..f3e72e562 100644 --- a/packages/opencode/src/altimate/native/types.ts +++ b/packages/opencode/src/altimate/native/types.ts @@ -879,6 +879,7 @@ export interface AltimateCoreTestgenParams { export interface AltimateCoreEquivalenceParams { sql1: string sql2: string + dialect?: string schema_path?: string schema_context?: Record } diff --git a/packages/opencode/src/altimate/review/dbt-patterns.ts b/packages/opencode/src/altimate/review/dbt-patterns.ts index 85b475aba..03df081f9 100644 --- a/packages/opencode/src/altimate/review/dbt-patterns.ts +++ b/packages/opencode/src/altimate/review/dbt-patterns.ts @@ -794,29 +794,33 @@ export function detectSchemaYmlPatterns(file: ChangedFile, rubric: Rubric): Find const kind = classifyDbtFile(file.path) if (kind !== "schema_yml" || file.status === "deleted") return [] const { added, removed } = splitDiff(file.diff) + const out: Finding[] = [] const removedTests = removed.filter((l) => /^\s*-\s*(unique|not_null|relationships)\b/i.test(l)) // A test still present (just moved) shouldn't fire. const addedTests = new Set(added.map((l) => l.trim())) const genuinelyRemoved = removedTests.filter((l) => !addedTests.has(l.trim())) - if (!genuinelyRemoved.length) return [] - const removedUnique = genuinelyRemoved.some((l) => /\bunique\b/i.test(l)) - const f = makeFinding({ - severity: removedUnique ? "warning" : "suggestion", - category: "test_coverage", - title: `${file.path.split("/").pop()}: removed ${genuinelyRemoved.length} data test(s)`, - body: - "This change deletes `unique`/`not_null`/`relationships` test(s) — the guardrails that catch fan-out, duplicate, " + - "and broken-FK regressions. Removing a `unique` test on a grain key is how silent duplicate bugs ship. " + - "Confirm the test is genuinely obsolete, not removed to make CI green.", - file: file.path, - confidence: "high", - evidence: { - tool: "dbt-patterns", - result: { rule: "removed_tests", removed: genuinelyRemoved.map((l) => l.trim()) }, - }, - ruleKey: "test_coverage:removed-tests", - }) - return [{ ...f, severity: clampSeverity(f.category, f.severity, f.confidence) }].filter( - (x) => !exclusionReason(x, rubric), - ) + if (genuinelyRemoved.length) { + const removedUnique = genuinelyRemoved.some((l) => /\bunique\b/i.test(l)) + const f = makeFinding({ + severity: removedUnique ? "warning" : "suggestion", + category: "test_coverage", + title: `${file.path.split("/").pop()}: removed ${genuinelyRemoved.length} data test(s)`, + body: + "This change deletes `unique`/`not_null`/`relationships` test(s) — the guardrails that catch fan-out, duplicate, " + + "and broken-FK regressions. Removing a `unique` test on a grain key is how silent duplicate bugs ship. " + + "Confirm the test is genuinely obsolete, not removed to make CI green.", + file: file.path, + confidence: "high", + evidence: { + tool: "dbt-patterns", + result: { rule: "removed_tests", removed: genuinelyRemoved.map((l) => l.trim()) }, + }, + ruleKey: "test_coverage:removed-tests", + }) + out.push(f) + } + out.push(...evaluateCatalog(file, "", added, removed, rubric)) + return out + .map((f) => ({ ...f, severity: clampSeverity(f.category, f.severity, f.confidence) })) + .filter((x) => !exclusionReason(x, rubric)) } diff --git a/packages/opencode/src/altimate/review/diff-filter.ts b/packages/opencode/src/altimate/review/diff-filter.ts index 7ba60b2a6..234c04c29 100644 --- a/packages/opencode/src/altimate/review/diff-filter.ts +++ b/packages/opencode/src/altimate/review/diff-filter.ts @@ -58,6 +58,7 @@ export type DbtFileKind = /** Classify a changed file by its role in a dbt project. */ export function classifyDbtFile(path: string): DbtFileKind { const p = path.replace(/\\/g, "/").toLowerCase() + const isYaml = p.endsWith(".yml") || p.endsWith(".yaml") if (/(^|\/)(dbt_project|profiles|packages|dependencies)\.ya?ml$/.test(p)) return "project_config" if (/(^|\/)macros\//.test(p)) return "macro" if (/(^|\/)snapshots\//.test(p)) return "snapshot" @@ -66,7 +67,8 @@ export function classifyDbtFile(path: string): DbtFileKind { if (/(^|\/)analyses\//.test(p)) return "analysis" if (/(^|\/)models\//.test(p) && p.endsWith(".py")) return "python_model" if (/(^|\/)models\//.test(p) && p.endsWith(".sql")) return "model_sql" - if (p.endsWith(".yml") || p.endsWith(".yaml")) return "schema_yml" + if (isYaml && /(^|\/)(models|snapshots|seeds|tests)\//.test(p)) return "schema_yml" + if (isYaml && /(^|\/)(_?schema|_?models|_?sources|sources|properties)\.ya?ml$/.test(p)) return "schema_yml" return "other" } diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index a72778a64..8ddb5bedf 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -813,6 +813,7 @@ const STRUCTURAL_CATEGORY: Record = { coalesce_removed: "semantic_change", removed_predicate: "semantic_change", type_narrowing: "contract_violation", + join_key_regression: "join_risk", } async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Promise { @@ -828,7 +829,7 @@ async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Pr const out: Finding[] = [] for (const f of raw) { const cat: ReviewCategory = STRUCTURAL_CATEGORY[f.rule] ?? "semantic_change" - const sev = clampSeverity(cat, "warning", "high") + const sev = clampSeverity(cat, f.severity === "error" ? "critical" : "warning", "high") out.push( makeFinding({ severity: sev, @@ -932,12 +933,18 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: if (!newCols.length) return [] const pii = await runner.classifyPii(newCols) const out: Finding[] = [] + const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path) for (const p of pii) { - if (p.confidence < 0.7 || !p.column) continue + const classification = String(p.classification ?? "") + const lowRiskClass = /^(name|address)$/i.test(classification) + const minConfidence = lowRiskClass ? 0.9 : 0.7 + if (p.confidence < minConfidence || !p.column) continue const col = p.column.toLowerCase() + const highConfidence = p.confidence >= 0.9 + const severity: Severity = highExposureLayer && highConfidence && !lowRiskClass ? "critical" : "warning" out.push( makeFinding({ - severity: "warning", + severity: clampSeverity("pii_exposure", severity, highConfidence ? "high" : "medium"), category: "pii_exposure", title: `${model}: exposes ${p.classification} column \`${p.column}\``, body: @@ -946,7 +953,7 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: (p.masking ? `; suggested masking: \`${p.masking}\`.` : "."), file: file.path, model, - confidence: p.confidence >= 0.9 ? "high" : "medium", + confidence: highConfidence ? "high" : "medium", evidence: { tool: "altimate_core.classify_pii", result: p }, ruleKey: `pii_exposure:${col}`, }), @@ -1070,7 +1077,8 @@ export async function runReview(input: OrchestrateInput): Promise() + if (input.runner.classifyPii && input.runner.columnLineage) { + for (const ctx of ctxByPath.values()) { + if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path) + } + } // Per file, the concatenated rule names core's AST lint actually emitted — // so we suppress a regex twin only when core genuinely covered that concern. const coreRulesByFile = new Map() @@ -1160,13 +1174,22 @@ export async function runReview(input: OrchestrateInput): Promise { - if (!coreRanFiles.has(f.file)) return true const tool = f.evidence?.tool + const fallbackRule = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-") + if ( + (tool === "dbt-patterns" || tool === "rule-catalog") && + diffScopedPiiFiles.has(f.file) && + f.category === "pii_exposure" && + (fallbackRule === "pii-into-mart" || fallbackRule === "select-pii-columns") + ) { + return false + } + if (!coreRanFiles.has(f.file)) return true if (tool !== "dbt-patterns" && tool !== "rule-catalog") return true // Structural twins: the check code survives on evidence.result.rule (ruleKey // is stripped by the zod schema). Suppress ONLY if core actually emitted the // equivalent rule for this file (else the issue would fall through). - const code = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-") + const code = fallbackRule const coreMatcher = CORE_AST_COVERED[code] if (coreMatcher && coreMatcher.test(coreRulesByFile.get(f.file) ?? "")) return false // Portability twins: drop a regex finding ONLY IF it is itself a portability diff --git a/packages/opencode/src/altimate/review/runner.ts b/packages/opencode/src/altimate/review/runner.ts index 1ffda22e6..f26b6add6 100644 --- a/packages/opencode/src/altimate/review/runner.ts +++ b/packages/opencode/src/altimate/review/runner.ts @@ -470,12 +470,13 @@ export function createDispatcherRunner(opts: DispatcherRunnerOptions): ReviewRun } }, - async equivalence(oldSql: string, newSql: string): Promise { + async equivalence(oldSql: string, newSql: string, dialect?: string): Promise { try { const schema = await resolveSchema() const res = await Dispatcher.call("altimate_core.equivalence", { sql1: oldSql, sql2: newSql, + dialect, schema_context: schema, }) const data = (res.data ?? {}) as Record diff --git a/packages/opencode/src/altimate/verify-shadow.ts b/packages/opencode/src/altimate/verify-shadow.ts new file mode 100644 index 000000000..05fe34e4e --- /dev/null +++ b/packages/opencode/src/altimate/verify-shadow.ts @@ -0,0 +1,66 @@ +// altimate_change - new file +// +// Verify-and-escalate SHADOW capture (altimate-router verifier platform integration, phase 1). +// +// The verifier proves a cheap model's edit is behaviour-equivalent (sound oracle) and escalates +// the rest — but to measure its real false-ship φ + savings on LIVE traffic we first need the +// real before/after edits WITH full file context (the seam's advantage over mining transcript +// snippets). This records each edit to a local JSONL; an offline pass (altimate-router +// eval/phi) reconstructs runnable units and runs the verifier. SHADOW only: it never changes +// edit behaviour, is OFF unless ALTIMATE_VERIFY_SHADOW is set, and can never throw. +// +// Phase 2 (later): call the verifier live (verify-daemon / napi addon) and act on the Decision. +import { appendFileSync, mkdirSync } from "node:fs" +import { dirname } from "node:path" +import { homedir } from "node:os" +import { Log } from "@/util/log" + +const log = Log.create({ service: "verify-shadow" }) + +const EXT_LANG: Record = { + py: "python", + js: "javascript", + mjs: "javascript", + cjs: "javascript", + jsx: "javascript", + ts: "typescript", + tsx: "typescript", + sql: "sql", +} + +function logPath(): string { + return process.env["ALTIMATE_VERIFY_SHADOW_LOG"] || `${homedir()}/.altimate-router/edit-capture.jsonl` +} + +export namespace VerifyShadow { + export interface EditCapture { + file: string + before: string + after: string + oldString?: string + newString?: string + } + + /** Record an edit for offline verifier measurement. No-op unless ALTIMATE_VERIFY_SHADOW is + * set; fail-safe (never throws — observability must never break an edit). */ + export function captureEdit(input: EditCapture): void { + if (!process.env["ALTIMATE_VERIFY_SHADOW"]) return + try { + const ext = input.file.split(".").pop()?.toLowerCase() ?? "" + const record = { + ts: Date.now(), + file: input.file, + language: EXT_LANG[ext] ?? ext, + before: input.before, + after: input.after, + old_string: input.oldString, + new_string: input.newString, + } + const path = logPath() + mkdirSync(dirname(path), { recursive: true }) + appendFileSync(path, JSON.stringify(record) + "\n") + } catch (e) { + log.error("edit capture failed", { error: e instanceof Error ? e.message : String(e) }) + } + } +} diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts index f3bd9094f..cd2bdfd34 100644 --- a/packages/opencode/src/cli/cmd/review.ts +++ b/packages/opencode/src/cli/cmd/review.ts @@ -42,10 +42,10 @@ export const ReviewCommand = cmd({ default: false, describe: "post the review to the GitHub PR (uses GITHUB_TOKEN + the Actions event)", }) - .option("no-ai", { + .option("ai", { type: "boolean", - default: false, - describe: "disable the advisory LLM reviewer lane (no model calls / cost)", + default: true, + describe: "enable the advisory LLM reviewer lane; use --no-ai to disable model calls / cost", }) .option("cwd", { type: "string", describe: "project directory (default: current dir)" }), async handler(args) { diff --git a/packages/opencode/src/tool/edit.ts b/packages/opencode/src/tool/edit.ts index 390de8381..ec121509e 100644 --- a/packages/opencode/src/tool/edit.ts +++ b/packages/opencode/src/tool/edit.ts @@ -17,6 +17,9 @@ import { Filesystem } from "../util/filesystem" import { Instance } from "../project/instance" import { Snapshot } from "@/snapshot" import { assertExternalDirectory, assertSensitiveWrite } from "./external-directory" +// altimate_change start — verify-and-escalate shadow capture (gated off by default; fail-safe) +import { VerifyShadow } from "../altimate/verify-shadow" +// altimate_change end const MAX_DIAGNOSTICS_PER_FILE = 20 @@ -120,6 +123,15 @@ export const EditTool = Tool.define("edit", { diff = trimDiff( createTwoFilesPatch(filePath, filePath, normalizeLineEndings(contentOld), normalizeLineEndings(contentNew)), ) + // altimate_change start — verify shadow capture (gated off by default; fail-safe, no behavior change) + VerifyShadow.captureEdit({ + file: filePath, + before: contentOld, + after: contentNew, + oldString: params.oldString, + newString: params.newString, + }) + // altimate_change end FileTime.read(ctx.sessionID, filePath) }) diff --git a/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts new file mode 100644 index 000000000..3da67fec9 --- /dev/null +++ b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts @@ -0,0 +1,44 @@ +import { afterAll, beforeAll, describe, expect, test } from "bun:test" + +import { runDataDiff } from "../../src/altimate/native/connections/data-diff" +import * as Registry from "../../src/altimate/native/connections/registry" + +const RUN = process.env.ALTIMATE_RUN_WAREHOUSE_E2E === "1" +const e2eTest = RUN ? test : test.skip + +describe("data.diff DuckDB e2e", () => { + beforeAll(() => { + process.env.ALTIMATE_TELEMETRY_DISABLED = "true" + }) + + afterAll(() => { + delete process.env.ALTIMATE_TELEMETRY_DISABLED + Registry.reset() + }) + + e2eTest("detects value and row deltas through a real DuckDB warehouse", async () => { + Registry.reset() + Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } }) + const conn = await Registry.get("duck_e2e") + + await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER)") + await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER)") + await conn.execute("INSERT INTO base_orders VALUES (1, 100), (2, 200)") + await conn.execute("INSERT INTO head_orders VALUES (1, 100), (2, 250), (3, 300)") + + const result = await runDataDiff({ + source: "base_orders", + target: "head_orders", + key_columns: ["order_id"], + extra_columns: ["amount"], + source_warehouse: "duck_e2e", + target_warehouse: "duck_e2e", + algorithm: "joindiff", + }) + + expect(result.success).toBe(true) + expect((result.outcome as any)?.mode).toBe("diff") + expect((result.outcome as any)?.stats?.exclusive_table2).toBe(1) + expect((result.outcome as any)?.stats?.updated).toBe(1) + }) +}) diff --git a/packages/opencode/test/altimate/review-ci.test.ts b/packages/opencode/test/altimate/review-ci.test.ts index f14705030..65c959af5 100644 --- a/packages/opencode/test/altimate/review-ci.test.ts +++ b/packages/opencode/test/altimate/review-ci.test.ts @@ -1,4 +1,5 @@ import { describe, test, expect, afterEach } from "bun:test" +import yargs from "yargs/yargs" import { resolveGitHubTarget } from "../../src/altimate/review/post-github" import { ReviewCommand } from "../../src/cli/cmd/review" import { buildReviewSchemaContext } from "../../src/altimate/review/schema-context" @@ -19,6 +20,21 @@ describe("review CLI command", () => { expect(ReviewCommand.command).toBe("review") expect(typeof ReviewCommand.handler).toBe("function") }) + + test("supports documented --no-ai flag", async () => { + let seen: any + await yargs(["review", "--no-ai", "--json"]) + .command({ ...ReviewCommand, handler: (args: any) => { seen = args } } as any) + .exitProcess(false) + .fail((msg, err) => { + throw err ?? new Error(msg) + }) + .strict() + .parseAsync() + + expect(seen.ai).toBe(false) + expect(seen.json).toBe(true) + }) }) describe("resolveGitHubTarget", () => { diff --git a/packages/opencode/test/altimate/review-dbt-patterns.test.ts b/packages/opencode/test/altimate/review-dbt-patterns.test.ts index 427333450..a5be4130c 100644 --- a/packages/opencode/test/altimate/review-dbt-patterns.test.ts +++ b/packages/opencode/test/altimate/review-dbt-patterns.test.ts @@ -146,6 +146,31 @@ describe("dbt-patterns detectors", () => { expect(has(f, "test_coverage")).toBe(true) }) + test("schema.yml: catalog rules run for metadata/test weakening", () => { + const f = detectSchemaYmlPatterns( + { + path: "models/marts/_models.yml", + status: "modified", + diff: "+ severity: warn\n- description: One row per order", + }, + DEFAULT_RUBRIC, + ) + expect(f.some((x) => x.evidence?.tool === "rule-catalog" && (x.evidence?.result as any)?.rule === "test-severity-warn")).toBe(true) + expect(f.some((x) => x.evidence?.tool === "rule-catalog" && (x.evidence?.result as any)?.rule === "yml-description-removed")).toBe(true) + }) + + test("workflow YAML is not treated as schema.yml", () => { + const f = detectSchemaYmlPatterns( + { + path: ".github/workflows/dbt-pr-review.yml", + status: "modified", + diff: "- - name: order_id\n- description: One row per order", + }, + DEFAULT_RUBRIC, + ) + expect(f.length).toBe(0) + }) + test("benign additive column produces NO dbt-pattern finding (precision)", () => { const sql = `select id, upper(status) as status_upper from {{ ref('x') }}` const f = detectModelPatterns( diff --git a/packages/opencode/test/altimate/review-runner.test.ts b/packages/opencode/test/altimate/review-runner.test.ts index f055149fb..40e38c8ad 100644 --- a/packages/opencode/test/altimate/review-runner.test.ts +++ b/packages/opencode/test/altimate/review-runner.test.ts @@ -1,9 +1,17 @@ -import { describe, expect, test } from "bun:test" +import { afterEach, describe, expect, spyOn, test } from "bun:test" import { writeFileSync } from "node:fs" import path from "node:path" import { createDispatcherRunner } from "../../src/altimate/review/runner" +import { Dispatcher } from "../../src/altimate/native" import { tmpdir } from "../fixture/fixture" +let dispatcherSpy: ReturnType | undefined + +afterEach(() => { + dispatcherSpy?.mockRestore() + dispatcherSpy = undefined +}) + describe("review manifest loading", () => { test("loads a valid manifest without initializing the native dispatcher", async () => { await using tmp = await tmpdir() @@ -36,4 +44,47 @@ describe("review manifest loading", () => { testCount: 0, }) }) + + test("threads adapter dialect into core equivalence", async () => { + await using tmp = await tmpdir() + const manifestPath = path.join(tmp.path, "manifest.json") + writeFileSync( + manifestPath, + JSON.stringify({ + metadata: { adapter_type: "duckdb" }, + nodes: { + "model.demo.orders": { + resource_type: "model", + name: "orders", + original_file_path: "models/orders.sql", + config: { materialized: "table" }, + depends_on: { nodes: [] }, + columns: { id: { name: "id", data_type: "integer" } }, + }, + }, + sources: {}, + }), + ) + + let seenParams: any + dispatcherSpy = spyOn(Dispatcher, "call").mockImplementation((async (method: string, params: any) => { + expect(method).toBe("altimate_core.equivalence") + seenParams = params + return { + success: true, + data: { + equivalent: true, + confidence: 0.95, + differences: [], + validation_errors: [], + }, + } + }) as any) + + const runner = createDispatcherRunner({ manifestPath }) + const result = await runner.equivalence("select id from orders", "select id from orders", "duckdb") + + expect(result).toEqual({ decided: true, equivalent: true, differences: [], confidence: "high" }) + expect(seenParams.dialect).toBe("duckdb") + }) }) diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index 0faf6a191..036687969 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -569,6 +569,130 @@ describe("orchestrate", () => { ).toBe(true) }) + test("core SC010 join-key regression maps to critical join_risk and blocks", async () => { + const oldSql = ` + select * + from orders + left join customers + on orders.customer_id = customers.customer_id + ` + const newSql = ` + select * + from orders + left join customers + on orders.order_id = customers.customer_id + ` + let sawBase = "" + let sawHead = "" + const runner: ReviewRunner = { + ...fakeRunner({}), + async structuralDiff(baseSql, headSql) { + sawBase = baseSql + sawHead = headSql + return [ + { + code: "SC010", + rule: "join_key_regression", + severity: "error", + message: + "A join key changed from matching the same identifier stem to `orders.order_id = customers.customer_id`.", + }, + ] + }, + } + const env = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_customer_orders.sql", + status: "modified", + diff: "+on orders.order_id = customers.customer_id\n-on orders.customer_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => (side === "new" ? newSql : oldSql), + getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql), + generatedAt: "2026-06-08T00:00:00Z", + }) + expect(sawBase).toBe(oldSql) + expect(sawHead).toBe(newSql) + const f = env.findings.find( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ) + expect(f).toBeDefined() + expect(f!.severity).toBe("critical") + expect(f!.category).toBe("join_risk") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("core structural diff quiet on CTE rename and bridge join countercases", async () => { + const runner: ReviewRunner = { + ...fakeRunner({}), + async structuralDiff() { + return [] + }, + } + const safeEnv = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_customer_orders.sql", + status: "modified", + diff: + "+on order_records.customer_id = customer_records.customer_id\n" + + "-on orders.customer_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => + side === "new" + ? "select * from order_records left join customer_records on order_records.customer_id = customer_records.customer_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + getCompiled: async (_f, side) => + side === "new" + ? "select * from order_records left join customer_records on order_records.customer_id = customer_records.customer_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + generatedAt: "2026-06-08T00:00:00Z", + }) + expect( + safeEnv.findings.some( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ), + ).toBe(false) + + const bridgeEnv = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_order_items.sql", + status: "modified", + diff: "+on orders.order_id = order_items.order_id\n-on orders.customer_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => + side === "new" + ? "select * from orders left join order_items on orders.order_id = order_items.order_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + getCompiled: async (_f, side) => + side === "new" + ? "select * from orders left join order_items on orders.order_id = order_items.order_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + generatedAt: "2026-06-08T00:00:00Z", + }) + expect( + bridgeEnv.findings.some( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ), + ).toBe(false) + }) + test("core L033 portability → regex portability twin suppressed, idempotency concern survives", async () => { const sql = "select getdate() as loaded_at from {{ ref('a') }}" const files: ChangedFile[] = [{ path: "models/marts/m.sql", status: "modified", diff: "+ , getdate() as loaded_at\n" }] @@ -1209,6 +1333,79 @@ describe("orchestrate", () => { expect(env.verdict).toBe("REQUEST_CHANGES") }) + test("diff-scoped core PII is the single blocker for newly introduced mart email", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ email\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["email"] } + }, + async columnLineage(sql) { + return sql.includes("email") + ? [ + { source: "customers.customer_name", target: "customer_name" }, + { source: "customers.email", target: "email" }, + ] + : [{ source: "customers.customer_name", target: "customer_name" }] + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Email", + confidence: 0.95, + masking: "'***MASKED***'", + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_name, email from customers", "select customer_name from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].severity).toBe("critical") + expect(pii[0].evidence?.tool).toBe("altimate_core.classify_pii") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("low-confidence name PII does not surface or fall back to regex PII comments", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ first_name\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["first_name"] } + }, + async columnLineage(sql) { + return sql.includes("first_name") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.first_name", target: "first_name" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Name", + confidence: 0.85, + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, first_name from customers", "select customer_id from customers"), + }) + expect(env.findings.some((f) => f.category === "pii_exposure")).toBe(false) + }) + test("clean change with no manifest → degraded, APPROVE/COMMENT, lint-only labeled", async () => { const files: ChangedFile[] = [{ path: "models/staging/stg_x.sql", status: "modified", diff: "+select 1\n" }] const runner: ReviewRunner = {