Skip to content

Fix dbt YAML review file classification#920

Merged
anandgupta42 merged 1 commit into
mainfrom
codex/dbt-review-demo-corpus
Jun 10, 2026
Merged

Fix dbt YAML review file classification#920
anandgupta42 merged 1 commit into
mainfrom
codex/dbt-review-demo-corpus

Conversation

@anandgupta42

@anandgupta42 anandgupta42 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PINEAPPLE

Summary

  • classify only dbt resource YAML and conventional dbt property YAML as schema.yml
  • avoid treating non-dbt YAML, such as GitHub workflow files, as dbt schema metadata
  • add a regression test for the workflow YAML false positive

Validation

  • bun test --timeout 30000 test/altimate/review-dbt-patterns.test.ts
  • bun run typecheck
  • pre-push bun turbo typecheck passed

Summary by CodeRabbit

  • Bug Fixes

    • Improved YAML file classification accuracy. YAML files are now properly categorized as schema files only when located in specific directories or matching schema-related naming patterns. YAML files in workflow directories are no longer incorrectly flagged as schema changes.
  • Tests

    • Added test coverage to ensure YAML files in workflow directories are correctly excluded from schema file detection.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4fc3c0a1-d8da-4292-9811-ed3c0e9c0686

📥 Commits

Reviewing files that changed from the base of the PR and between 6185bfd and 0ac9a2a.

📒 Files selected for processing (2)
  • packages/opencode/src/altimate/review/diff-filter.ts
  • packages/opencode/test/altimate/review-dbt-patterns.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opencode/test/altimate/review-dbt-patterns.test.ts
  • packages/opencode/src/altimate/review/diff-filter.ts

📝 Walkthrough

Walkthrough

The PR narrows dbt YAML detection: classifyDbtFile now computes isYaml and returns schema_yml only for YAML files under dbt model/snapshot/seed/test directories or matching schema-like filename/path patterns. A test ensures workflow YAML (e.g., .github/workflows/*) is excluded.

Changes

YAML Classification Precision

Layer / File(s) Summary
YAML classification refinement and validation
packages/opencode/src/altimate/review/diff-filter.ts, packages/opencode/test/altimate/review-dbt-patterns.test.ts
classifyDbtFile introduces an isYaml flag and tightens schema_yml classification to require YAML located in model/snapshot/seed/test directories or matching schema-like filename/path patterns. Added test asserts .github/workflows/*.yml are not classified as schema YAML.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I sniffed the paths both near and far,
YAML truth now shines — workflow files don't mar.
Schema names and dbt nests I mind,
Only real schema_yml I now find. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing how dbt YAML files are classified in the review process, distinguishing actual dbt schema files from unrelated YAML files.
Description check ✅ Passed The description includes the required PINEAPPLE marker, explains the summary and changes, and documents validation steps. All required template sections are substantially covered.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/dbt-review-demo-corpus

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts`:
- Around line 10-16: The test unconditionally deletes
ALTIMATE_TELEMETRY_DISABLED in afterAll which clobbers any pre-existing
environment value; modify the setup/teardown to capture the prior value in
beforeAll (e.g., const prevTelemetry = process.env.ALTIMATE_TELEMETRY_DISABLED)
then set process.env.ALTIMATE_TELEMETRY_DISABLED = "true", and in afterAll
restore that saved value (if prevTelemetry is undefined delete the env var,
otherwise set it back to prevTelemetry) before calling Registry.reset(); update
references in the beforeAll/afterAll blocks accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e73537bc-66fa-47be-ae98-c2425126aaba

📥 Commits

Reviewing files that changed from the base of the PR and between c2019ba and b0c0fce.

📒 Files selected for processing (16)
  • docs/docs/usage/dbt-pr-review.md
  • docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
  • docs/internal/dbt-pr-review-demo-scenario-corpus.md
  • packages/drivers/src/duckdb.ts
  • packages/opencode/src/altimate/native/altimate-core.ts
  • packages/opencode/src/altimate/native/types.ts
  • packages/opencode/src/altimate/review/dbt-patterns.ts
  • packages/opencode/src/altimate/review/diff-filter.ts
  • packages/opencode/src/altimate/review/orchestrate.ts
  • packages/opencode/src/altimate/review/runner.ts
  • packages/opencode/src/cli/cmd/review.ts
  • packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts
  • packages/opencode/test/altimate/review-ci.test.ts
  • packages/opencode/test/altimate/review-dbt-patterns.test.ts
  • packages/opencode/test/altimate/review-runner.test.ts
  • packages/opencode/test/altimate/review.test.ts

Comment thread packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 16 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/review/dbt-patterns.ts">

<violation number="1" location="packages/opencode/src/altimate/review/dbt-patterns.ts:822">
P2: `detectSchemaYmlPatterns` now double-reports removed `relationships` tests by combining an existing hard-coded detector with the catalog rule for the same condition.</violation>
</file>

<file name="docs/internal/dbt-pr-review-demo-scenario-corpus.md">

<violation number="1" location="docs/internal/dbt-pr-review-demo-scenario-corpus.md:6">
P3: Machine-specific absolute paths in the runbook make the documented validation command non-reproducible for other developers.</violation>

<violation number="2" location="docs/internal/dbt-pr-review-demo-scenario-corpus.md:68">
P3: Implemented 50-Scenario Corpus table is inconsistent with the defined metadata schema: the evidence column uses the header `deterministic evidence` (taken from the meaning text) instead of the schema field name `evidence`, and the required `validation` column is missing entirely.</violation>
</file>

<file name="packages/opencode/src/altimate/review/orchestrate.ts">

<violation number="1" location="packages/opencode/src/altimate/review/orchestrate.ts:1152">
P1: Fallback PII findings are suppressed based on capability, even when diff-scoped core PII classification produced no result, which can hide real PII exposure findings on classifier failure.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

const diffScopedPiiFiles = new Set<string>()
if (input.runner.classifyPii && input.runner.columnLineage) {
for (const ctx of ctxByPath.values()) {
if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Fallback PII findings are suppressed based on capability, even when diff-scoped core PII classification produced no result, which can hide real PII exposure findings on classifier failure.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/review/orchestrate.ts, line 1152:

<comment>Fallback PII findings are suppressed based on capability, even when diff-scoped core PII classification produced no result, which can hide real PII exposure findings on classifier failure.</comment>

<file context>
@@ -1138,6 +1146,12 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
+  const diffScopedPiiFiles = new Set<string>()
+  if (input.runner.classifyPii && input.runner.columnLineage) {
+    for (const ctx of ctxByPath.values()) {
+      if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path)
+    }
+  }
</file context>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by rebuilding PR #920 to the scoped production change only. This file/comment is from the previous b0c0fce version and is no longer in the PR diff. Current PR changes only packages/opencode/src/altimate/review/diff-filter.ts and packages/opencode/test/altimate/review-dbt-patterns.test.ts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up — I’ll skip that obsolete comment.

})
out.push(f)
}
out.push(...evaluateCatalog(file, "", added, removed, rubric))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: detectSchemaYmlPatterns now double-reports removed relationships tests by combining an existing hard-coded detector with the catalog rule for the same condition.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/review/dbt-patterns.ts, line 822:

<comment>`detectSchemaYmlPatterns` now double-reports removed `relationships` tests by combining an existing hard-coded detector with the catalog rule for the same condition.</comment>

<file context>
@@ -794,29 +794,33 @@ export function detectSchemaYmlPatterns(file: ChangedFile, rubric: Rubric): Find
+    })
+    out.push(f)
+  }
+  out.push(...evaluateCatalog(file, "", added, removed, rubric))
+  return out
+    .map((f) => ({ ...f, severity: clampSeverity(f.category, f.severity, f.confidence) }))
</file context>
Suggested change
out.push(...evaluateCatalog(file, "", added, removed, rubric))
out.push(
...evaluateCatalog(file, "", added, removed, rubric).filter(
(f) =>
!(
genuinelyRemoved.some((l) => /\brelationships\b/i.test(l)) &&
f.evidence?.tool === "rule-catalog" &&
(f.evidence?.result as any)?.rule === "relationship-test-removed-yml"
),
),
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by rebuilding PR #920 to the scoped production change only. This file/comment is from the previous b0c0fce version and is no longer in the PR diff. Current PR changes only packages/opencode/src/altimate/review/diff-filter.ts and packages/opencode/test/altimate/review-dbt-patterns.test.ts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update — I’ll ignore that obsolete comment.

with DuckDB, `dbt build`, `dbt compile`, `dbt docs generate`, and the default
focused review command above. AI must remain disabled for demo acceptance.

| id | branch | status | title | category | expected | deterministic evidence | artifact_needs | demo_script |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Implemented 50-Scenario Corpus table is inconsistent with the defined metadata schema: the evidence column uses the header deterministic evidence (taken from the meaning text) instead of the schema field name evidence, and the required validation column is missing entirely.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/internal/dbt-pr-review-demo-scenario-corpus.md, line 68:

<comment>Implemented 50-Scenario Corpus table is inconsistent with the defined metadata schema: the evidence column uses the header `deterministic evidence` (taken from the meaning text) instead of the schema field name `evidence`, and the required `validation` column is missing entirely.</comment>

<file context>
@@ -0,0 +1,217 @@
+with DuckDB, `dbt build`, `dbt compile`, `dbt docs generate`, and the default
+focused review command above. AI must remain disabled for demo acceptance.
+
+| id | branch | status | title | category | expected | deterministic evidence | artifact_needs | demo_script |
+|---|---|---|---|---|---|---|---|---|
+| s001 | `demo/safe-refactor` | implemented | Safe CTE refactor is approved | safe_refactor | APPROVE | no findings | manifest, catalog, target-base | Proves the reviewer stays quiet for harmless SQL cleanup. |
</file context>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by rebuilding PR #920 to the scoped production change only. This file/comment is from the previous b0c0fce version and is no longer in the PR diff. Current PR changes only packages/opencode/src/altimate/review/diff-filter.ts and packages/opencode/test/altimate/review-dbt-patterns.test.ts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the scope update.

Status: active corpus build, started 2026-06-08.

Goal: build 50 customer-demo-ready dbt PR scenarios in
`/Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo`. Each

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Machine-specific absolute paths in the runbook make the documented validation command non-reproducible for other developers.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/internal/dbt-pr-review-demo-scenario-corpus.md, line 6:

<comment>Machine-specific absolute paths in the runbook make the documented validation command non-reproducible for other developers.</comment>

<file context>
@@ -0,0 +1,217 @@
+Status: active corpus build, started 2026-06-08.
+
+Goal: build 50 customer-demo-ready dbt PR scenarios in
+`/Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo`. Each
+scenario should be a small PR branch that either demonstrates a deterministic
+reviewer catch or proves the reviewer correctly stays quiet for a safe change.
</file context>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by rebuilding PR #920 to the scoped production change only. This file/comment is from the previous b0c0fce version and is no longer in the PR diff. Current PR changes only packages/opencode/src/altimate/review/diff-filter.ts and packages/opencode/test/altimate/review-dbt-patterns.test.ts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the update.

@dev-punia-altimate

Copy link
Copy Markdown
Contributor

❌ Tests — Failures Detected

TypeScript — 15 failure(s)

  • connection_refused [1.00ms]
  • timeout
  • permission_denied
  • parse_error
  • network_error
  • auth_failure
  • rate_limit
  • internal_error
  • empty_error
  • connection_refused
  • timeout
  • permission_denied [1.00ms]
  • parse_error
  • network_error
  • auth_failure

Next Step

Please address the failing cases above and re-run verification.

cc @anandgupta42

@anandgupta42 anandgupta42 force-pushed the codex/dbt-review-demo-corpus branch from b0c0fce to 6185bfd Compare June 10, 2026 00:01
@anandgupta42 anandgupta42 changed the title Finalize dbt PR review demo corpus Fix dbt YAML review file classification Jun 10, 2026
@anandgupta42

Copy link
Copy Markdown
Contributor Author

Addressed review feedback for the current scope:

  • PR Fix dbt YAML review file classification #920 was rebuilt from the earlier 16-file/internal-doc version to the scoped production fix only.
  • Stale inline comments on files no longer in the PR have been replied to individually.
  • The PR body now starts with the required PINEAPPLE marker.
  • Current changed files are only:
    • packages/opencode/src/altimate/review/diff-filter.ts
    • packages/opencode/test/altimate/review-dbt-patterns.test.ts
  • Local validation passed: bun test --timeout 30000 test/altimate/review-dbt-patterns.test.ts and bun run typecheck.

The older centralized test failure comment was for the prior PR contents; the current branch has been force-updated and checks are re-running on commit 6185bfd.

@anandgupta42 anandgupta42 force-pushed the codex/dbt-review-demo-corpus branch from 6185bfd to 0ac9a2a Compare June 10, 2026 00:13
@anandgupta42 anandgupta42 merged commit 954cc32 into main Jun 10, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants