Skip to content

Support incremental_strategy parameter and new insert_overwrite strategy#2195

Open
SuchodolskiEdvin wants to merge 1 commit into
mainfrom
insert_overwrite_incremental_strategy
Open

Support incremental_strategy parameter and new insert_overwrite strategy#2195
SuchodolskiEdvin wants to merge 1 commit into
mainfrom
insert_overwrite_incremental_strategy

Conversation

@SuchodolskiEdvin

Copy link
Copy Markdown
Collaborator
  • updated proto with new parameters
  • added new tests
  • added validation for chosen incremental_strategies
  • added new insert_overwrite strategy logic

@SuchodolskiEdvin SuchodolskiEdvin requested a review from a team as a code owner June 2, 2026 14:39
@SuchodolskiEdvin SuchodolskiEdvin requested review from zaptot and removed request for a team June 2, 2026 14:39
@SuchodolskiEdvin SuchodolskiEdvin force-pushed the insert_overwrite_incremental_strategy branch from 9f72a8c to 9724071 Compare June 2, 2026 15:13
@SuchodolskiEdvin SuchodolskiEdvin requested review from kolina and removed request for zaptot June 3, 2026 09:56
@SuchodolskiEdvin SuchodolskiEdvin force-pushed the insert_overwrite_incremental_strategy branch from 9724071 to 41d152d Compare June 3, 2026 13:46
const backtickedColumns = columns.map(column => `\`${column}\``);
const resolveTargetTable = this.resolveTarget(target);

return `CREATE OR REPLACE TEMP TABLE \`${stagingTableUnqualified}\` AS (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split separate SQL statements into granular Task.statement calls?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can't split this SQL because we are using TEMP table. We could in case we would use persistent table.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Nick means here is to split the sql creation into statements not the sql itself.

@SuchodolskiEdvin SuchodolskiEdvin Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolina can you clarify what exactly do you mean?

MERGE ${resolveTargetTable} T
USING \`${stagingTableUnqualified}\` S
ON FALSE
WHEN NOT MATCHED BY SOURCE AND ${partitionBy} IN UNNEST(partitions_for_replacement) ${updatePartitionFilter ? `and T.${updatePartitionFilter}` : ""} THEN

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such code T.${updatePartitionFilter} will only work when updatePartitionFilter has exactly one expression?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. It is a known limitation that T.${updatePartitionFilter} only works for simple expressions (and fails on multi-expression SQL). Current implementation is designed to match the existing behavior of the standard MERGE strategy to maintain consistency between the two strategies for now. The fix of using updatePartitionFilter with several expression will be introduced in a separate PR.

Comment thread cli/api/dbadapters/execution_sql.ts Outdated
Comment on lines +511 to +514
return this.mergeInto(
table.target,
columns,
incrementalQuery,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you deduplicate this with a call below?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread core/actions/incremental_table.ts Outdated
Comment on lines +222 to +243
switch (this.proto.incrementalStrategy) {
case dataform.IncrementalStrategy.INSERT_OVERWRITE:
if (!this.proto.bigquery || !this.proto.bigquery.partitionBy) {
this.session.compileError(
new Error("incrementalStrategy 'insert_overwrite' requires 'partitionBy' to be set."),
config.filename,
this.proto.target
);
}
break;
case dataform.IncrementalStrategy.MERGE:
if (!this.proto.uniqueKey || this.proto.uniqueKey.length === 0) {
this.session.compileError(
new Error("incrementalStrategy 'merge' requires 'uniqueKey' to be set."),
config.filename,
this.proto.target
);
}
break;
default:
break;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move it into sub-function for readability

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread core/actions/incremental_table_test.ts Outdated
Comment on lines +1017 to +1019
expect(result.compile.compiledGraph.tables[0].incrementalStrategy).equals(
dataform.IncrementalStrategy.INSERT_OVERWRITE
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check the whole object here (same below)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread protos/configs.proto Outdated
string reservation = 27;

// Optional. The incremental strategy to use when updating the table.
// Defaults to MERGE if uniqueKey is configured, or APPEND otherwise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't include it into config protos (because it's run-time strategy and not related to compiled DAG generation)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@SuchodolskiEdvin SuchodolskiEdvin force-pushed the insert_overwrite_incremental_strategy branch 4 times, most recently from f11f8f3 to 1942a72 Compare June 8, 2026 16:29

const result = runMainInVm(coreExecutionRequestFromPath(projectDir));

expect(result.compile.compiledGraph.graphErrors.compilationErrors.length).greaterThan(0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be checked to be exactly equal to 1 to avoid any other uncaught errors the code change might cause.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


const result = runMainInVm(coreExecutionRequestFromPath(projectDir));

expect(result.compile.compiledGraph.graphErrors.compilationErrors.length).greaterThan(0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread core/actions/incremental_table.ts Outdated
case dataform.IncrementalStrategy.INSERT_OVERWRITE:
if (!this.proto.bigquery || !this.proto.bigquery.partitionBy) {
this.session.compileError(
new Error("incrementalStrategy 'insert_overwrite' requires 'partitionBy' to be set."),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Inconsistent style. in some places we have capital first letter and in others we don't. We also use "." in some places and miss it in others. for example on line 759.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

target: dataform.ITarget,
columns: string[],
query: string,
partitionBy: string,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the table.bigquery or table.bigquery.partitionBy is not provided? will it be an empty string or 'null'?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If table.bigquery or table.bigquery.partitionBy is not provided or it is empty string - insertOverwrite() won't be reached. There is validation of partitionBy parameter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the validation link doesnt lead anywhere so I cannot check that.

@SuchodolskiEdvin SuchodolskiEdvin Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange, because it's working for me. This is link to the code change in scope of this PR. Please check core/actions/incremental_table.ts:checkIncrementalStrategyRequirements() (LINE 778)

ARRAY(
SELECT DISTINCT ${partitionBy}
FROM \`${stagingTableUnqualified}\`
WHERE ${partitionBy} IS NOT NULL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this statement look like if we have empty partitionBy? also will it have `` around the column name somehow? Do we have an example for final sql that will be generated.

@SuchodolskiEdvin SuchodolskiEdvin Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If partitionBy would contain column wrapped in function (like Date(ts)) adding backticks cause BigQuery to treat the entire expression as a literal column name.

Yes, we have golden files covering the generated SQL for insertOverwrite in cli/api/goldens/insert_overwrite*.sql

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is question I have. So if PartitionBy is not a function like Date and is just a column name lets say id. This will not add backticks to it. Will the query run in that case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the query will run successfully, assuming the column isn't a reserved BigQuery keyword (like order). If it is a keyword, the user can still provide the column name wrapped in backticks directly in their config. Just FYI: this matches the existing behavior of the MERGE strategy, which also injects uniqueKey columns without automatically adding backticks.

const backtickedColumns = columns.map(column => `\`${column}\``);
const resolveTargetTable = this.resolveTarget(target);

return `CREATE OR REPLACE TEMP TABLE \`${stagingTableUnqualified}\` AS (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Nick means here is to split the sql creation into statements not the sql itself.

INSERT (`id`,`field1`) VALUES (`id`,`field1`);
END;

DROP TABLE IF EXISTS `staging_table_temp_test_uuid`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the execution_sql.ts file correctly, there should be a ";" at the end no? same in insert_overwrite_extend.sql.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change. concatenateQueries() helper function (in cli/api/dbadapters/tasks.ts) explicitly trims trailing whitespace and strips the final semicolon from every individual statement before joining them together with. Because of this stripping logic, the final sql that Dataform generates don't have a trailing semicolon at the very end.

...baseTable,
incrementalStrategy: dataform.IncrementalStrategy.INSERT_OVERWRITE,
bigquery: {
partitionBy: "DATE(ts)",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an example column that is not a function like Date. My concern is that we a re biased toward handling this method. so it its just a normal column name we might not handle it correctly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional test for normal column.

@SuchodolskiEdvin SuchodolskiEdvin force-pushed the insert_overwrite_incremental_strategy branch 2 times, most recently from 7352b3d to a90b434 Compare June 22, 2026 15:37
- updated proto with new parameters
- added new tests
- added validation for chosen incremental_strategies
- added new insert_overwrite strategy logic
@SuchodolskiEdvin SuchodolskiEdvin force-pushed the insert_overwrite_incremental_strategy branch from a90b434 to 8f1bcb7 Compare June 22, 2026 16:03
@SuchodolskiEdvin SuchodolskiEdvin requested a review from Tuseeq1 June 22, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants