Support incremental_strategy parameter and new insert_overwrite strategy#2195
Support incremental_strategy parameter and new insert_overwrite strategy#2195SuchodolskiEdvin wants to merge 1 commit into
Conversation
SuchodolskiEdvin
commented
Jun 2, 2026
- updated proto with new parameters
- added new tests
- added validation for chosen incremental_strategies
- added new insert_overwrite strategy logic
9f72a8c to
9724071
Compare
9724071 to
41d152d
Compare
| const backtickedColumns = columns.map(column => `\`${column}\``); | ||
| const resolveTargetTable = this.resolveTarget(target); | ||
|
|
||
| return `CREATE OR REPLACE TEMP TABLE \`${stagingTableUnqualified}\` AS ( |
There was a problem hiding this comment.
can you split separate SQL statements into granular Task.statement calls?
There was a problem hiding this comment.
I believe we can't split this SQL because we are using TEMP table. We could in case we would use persistent table.
There was a problem hiding this comment.
I think what Nick means here is to split the sql creation into statements not the sql itself.
There was a problem hiding this comment.
@kolina can you clarify what exactly do you mean?
| MERGE ${resolveTargetTable} T | ||
| USING \`${stagingTableUnqualified}\` S | ||
| ON FALSE | ||
| WHEN NOT MATCHED BY SOURCE AND ${partitionBy} IN UNNEST(partitions_for_replacement) ${updatePartitionFilter ? `and T.${updatePartitionFilter}` : ""} THEN |
There was a problem hiding this comment.
Such code T.${updatePartitionFilter} will only work when updatePartitionFilter has exactly one expression?
There was a problem hiding this comment.
Yes, you are correct. It is a known limitation that T.${updatePartitionFilter} only works for simple expressions (and fails on multi-expression SQL). Current implementation is designed to match the existing behavior of the standard MERGE strategy to maintain consistency between the two strategies for now. The fix of using updatePartitionFilter with several expression will be introduced in a separate PR.
| return this.mergeInto( | ||
| table.target, | ||
| columns, | ||
| incrementalQuery, |
There was a problem hiding this comment.
Can you deduplicate this with a call below?
| switch (this.proto.incrementalStrategy) { | ||
| case dataform.IncrementalStrategy.INSERT_OVERWRITE: | ||
| if (!this.proto.bigquery || !this.proto.bigquery.partitionBy) { | ||
| this.session.compileError( | ||
| new Error("incrementalStrategy 'insert_overwrite' requires 'partitionBy' to be set."), | ||
| config.filename, | ||
| this.proto.target | ||
| ); | ||
| } | ||
| break; | ||
| case dataform.IncrementalStrategy.MERGE: | ||
| if (!this.proto.uniqueKey || this.proto.uniqueKey.length === 0) { | ||
| this.session.compileError( | ||
| new Error("incrementalStrategy 'merge' requires 'uniqueKey' to be set."), | ||
| config.filename, | ||
| this.proto.target | ||
| ); | ||
| } | ||
| break; | ||
| default: | ||
| break; | ||
| } |
There was a problem hiding this comment.
let's move it into sub-function for readability
| expect(result.compile.compiledGraph.tables[0].incrementalStrategy).equals( | ||
| dataform.IncrementalStrategy.INSERT_OVERWRITE | ||
| ); |
There was a problem hiding this comment.
let's check the whole object here (same below)
| string reservation = 27; | ||
|
|
||
| // Optional. The incremental strategy to use when updating the table. | ||
| // Defaults to MERGE if uniqueKey is configured, or APPEND otherwise. |
There was a problem hiding this comment.
I wouldn't include it into config protos (because it's run-time strategy and not related to compiled DAG generation)
f11f8f3 to
1942a72
Compare
|
|
||
| const result = runMainInVm(coreExecutionRequestFromPath(projectDir)); | ||
|
|
||
| expect(result.compile.compiledGraph.graphErrors.compilationErrors.length).greaterThan(0); |
There was a problem hiding this comment.
I think this should be checked to be exactly equal to 1 to avoid any other uncaught errors the code change might cause.
|
|
||
| const result = runMainInVm(coreExecutionRequestFromPath(projectDir)); | ||
|
|
||
| expect(result.compile.compiledGraph.graphErrors.compilationErrors.length).greaterThan(0); |
| case dataform.IncrementalStrategy.INSERT_OVERWRITE: | ||
| if (!this.proto.bigquery || !this.proto.bigquery.partitionBy) { | ||
| this.session.compileError( | ||
| new Error("incrementalStrategy 'insert_overwrite' requires 'partitionBy' to be set."), |
There was a problem hiding this comment.
Nit: Inconsistent style. in some places we have capital first letter and in others we don't. We also use "." in some places and miss it in others. for example on line 759.
| target: dataform.ITarget, | ||
| columns: string[], | ||
| query: string, | ||
| partitionBy: string, |
There was a problem hiding this comment.
What happens if the table.bigquery or table.bigquery.partitionBy is not provided? will it be an empty string or 'null'?
There was a problem hiding this comment.
If table.bigquery or table.bigquery.partitionBy is not provided or it is empty string - insertOverwrite() won't be reached. There is validation of partitionBy parameter.
There was a problem hiding this comment.
the validation link doesnt lead anywhere so I cannot check that.
There was a problem hiding this comment.
It's strange, because it's working for me. This is link to the code change in scope of this PR. Please check core/actions/incremental_table.ts:checkIncrementalStrategyRequirements() (LINE 778)
| ARRAY( | ||
| SELECT DISTINCT ${partitionBy} | ||
| FROM \`${stagingTableUnqualified}\` | ||
| WHERE ${partitionBy} IS NOT NULL |
There was a problem hiding this comment.
What will this statement look like if we have empty partitionBy? also will it have `` around the column name somehow? Do we have an example for final sql that will be generated.
There was a problem hiding this comment.
If partitionBy would contain column wrapped in function (like Date(ts)) adding backticks cause BigQuery to treat the entire expression as a literal column name.
Yes, we have golden files covering the generated SQL for insertOverwrite in cli/api/goldens/insert_overwrite*.sql
There was a problem hiding this comment.
this is question I have. So if PartitionBy is not a function like Date and is just a column name lets say id. This will not add backticks to it. Will the query run in that case.
There was a problem hiding this comment.
Yes, the query will run successfully, assuming the column isn't a reserved BigQuery keyword (like order). If it is a keyword, the user can still provide the column name wrapped in backticks directly in their config. Just FYI: this matches the existing behavior of the MERGE strategy, which also injects uniqueKey columns without automatically adding backticks.
| const backtickedColumns = columns.map(column => `\`${column}\``); | ||
| const resolveTargetTable = this.resolveTarget(target); | ||
|
|
||
| return `CREATE OR REPLACE TEMP TABLE \`${stagingTableUnqualified}\` AS ( |
There was a problem hiding this comment.
I think what Nick means here is to split the sql creation into statements not the sql itself.
| INSERT (`id`,`field1`) VALUES (`id`,`field1`); | ||
| END; | ||
|
|
||
| DROP TABLE IF EXISTS `staging_table_temp_test_uuid` |
There was a problem hiding this comment.
If I read the execution_sql.ts file correctly, there should be a ";" at the end no? same in insert_overwrite_extend.sql.
There was a problem hiding this comment.
Good catch. Fixed.
There was a problem hiding this comment.
Reverted this change. concatenateQueries() helper function (in cli/api/dbadapters/tasks.ts) explicitly trims trailing whitespace and strips the final semicolon from every individual statement before joining them together with. Because of this stripping logic, the final sql that Dataform generates don't have a trailing semicolon at the very end.
| ...baseTable, | ||
| incrementalStrategy: dataform.IncrementalStrategy.INSERT_OVERWRITE, | ||
| bigquery: { | ||
| partitionBy: "DATE(ts)", |
There was a problem hiding this comment.
Do we have an example column that is not a function like Date. My concern is that we a re biased toward handling this method. so it its just a normal column name we might not handle it correctly.
There was a problem hiding this comment.
I added additional test for normal column.
7352b3d to
a90b434
Compare
- updated proto with new parameters - added new tests - added validation for chosen incremental_strategies - added new insert_overwrite strategy logic
a90b434 to
8f1bcb7
Compare