⚡ Bolt: [performance improvement] Optimize SQL Query generation by removing string allocations#197
Conversation
* Replaced multiple `Vec<String>` allocations (`columns`, `placeholders`, `update_clauses`, `where_clauses`) with pre-allocated `String::with_capacity` buffers in `build_upsert_stmt` and `build_delete_stmt` in `crates/flow/src/targets/d1.rs`.
* Used the `std::fmt::Write` macro (`write!`) and `.push_str()` instead of `.join(", ")` strings, completely removing multiple intermediate string and vector memory allocations per row operation.
* Replaced `sql.push_str("?")` with the char equivalent `sql.push('?')`.
* Adhered to repository specific explicit lifetime clippy lints within `crates/rule-engine`.
Co-authored-by: bashandbone <[email protected]>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideOptimizes SQL generation in D1ExportContext by replacing intermediate Vec-based query assembly with preallocated String buffers and incremental writes, while also fixing a clippy lifetime warning and minor formatting issues in other crates. Class diagram for optimized SQL builders in D1ExportContextclassDiagram
class D1ExportContext {
+table_name: String
+key_fields_schema: Vec<KeyFieldSchema>
+value_fields_schema: Vec<ValueFieldSchema>
+build_upsert_stmt(key: KeyValue, values: FieldValues) ResultSqlParams
+build_delete_stmt(key: KeyValue) ResultSqlParams
}
class KeyFieldSchema {
+name: String
}
class ValueFieldSchema {
+name: String
}
class KeyValue {
+0: BoxKeyPartSlice
}
class FieldValues {
+fields: VecField
}
class ResultSqlParams {
+sql: String
+params: VecSerdeJsonValue
+error: RecocoError
}
class RecocoError
class SerdeJsonValue
D1ExportContext "1" --> "*" KeyFieldSchema : uses
D1ExportContext "1" --> "*" ValueFieldSchema : uses
D1ExportContext --> KeyValue : param
D1ExportContext --> FieldValues : param
ResultSqlParams --> RecocoError
ResultSqlParams --> SerdeJsonValue
Flow diagram for optimized build_upsert_stmt SQL generationflowchart TD
Start["Start build_upsert_stmt"] --> InitSql["Initialize sql with String_with_capacity and params as empty Vec"]
InitSql --> WriteInsert["write INSERT INTO table_name ( into sql"]
WriteInsert --> InitFirstCol["Set first_col to true"]
InitFirstCol --> LoopKeys["For each index in key_fields_schema"]
LoopKeys --> CheckKeyPart{KeyPart exists at index?}
CheckKeyPart -- "no" --> NextKey["Next key field"]
CheckKeyPart -- "yes" --> AppendKeyName["If not first_col add ', ' then append key field name to sql"]
AppendKeyName --> PushKeyParam["Convert key_part_to_json and push into params"]
PushKeyParam --> SetFirstColFalse["Set first_col to false"]
SetFirstColFalse --> NextKey
NextKey --> LoopKeys
LoopKeys --> LoopValues["For each index and value in values.fields"]
LoopValues --> CheckValueField{value_field exists at index?}
CheckValueField -- "no" --> NextValue["Next value field"]
CheckValueField -- "yes" --> AppendValueName["If not first_col add ', ' then append value_field name to sql"]
AppendValueName --> PushValueParam["Convert value_to_json and push into params"]
PushValueParam --> SetFirstColFalse2["Set first_col to false"]
SetFirstColFalse2 --> NextValue
NextValue --> LoopValues
LoopValues --> CloseColumns["Append ) VALUES ( to sql"]
CloseColumns --> LoopParams["For i from 0 to params.len"]
LoopParams --> ParamFirst{Is i == 0?}
ParamFirst -- "yes" --> FirstPlaceholder["Push '?' to sql"]
ParamFirst -- "no" --> LaterPlaceholder["Append ', ?' to sql"]
FirstPlaceholder --> NextParam["Next i"]
LaterPlaceholder --> NextParam
NextParam --> LoopParams
LoopParams --> AppendConflict["Append ) ON CONFLICT DO UPDATE SET to sql"]
AppendConflict --> InitFirstUpdate["Set first_update to true"]
InitFirstUpdate --> LoopUpdate["For each index in values.fields"]
LoopUpdate --> CheckUpdateField{value_field exists at index?}
CheckUpdateField -- "no" --> NextUpdate["Next update field"]
CheckUpdateField -- "yes" --> AppendUpdate["If not first_update add ', ' then write name = excluded.name into sql"]
AppendUpdate --> SetFirstUpdateFalse["Set first_update to false"]
SetFirstUpdateFalse --> NextUpdate
NextUpdate --> LoopUpdate
LoopUpdate --> ReturnResult["Return (sql, params)"]
ReturnResult --> End["End build_upsert_stmt"]
Flow diagram for optimized build_delete_stmt SQL generationflowchart TD
Start["Start build_delete_stmt"] --> InitSql["Initialize sql with String_with_capacity and params as empty Vec"]
InitSql --> WriteDelete["write DELETE FROM table_name WHERE into sql"]
WriteDelete --> InitFirst["Set first to true"]
InitFirst --> LoopKeys["For each index in key_fields_schema"]
LoopKeys --> CheckKeyPart{KeyPart exists at index?}
CheckKeyPart -- "no" --> NextKey["Next key field"]
CheckKeyPart -- "yes" --> AppendWhere["If not first append ' AND ' then append field name and ' = ?' to sql"]
AppendWhere --> PushParam["Convert key_part_to_json and push into params"]
PushParam --> SetFirstFalse["Set first to false"]
SetFirstFalse --> NextKey
NextKey --> LoopKeys
LoopKeys --> ReturnResult["Return (sql, params)"]
ReturnResult --> End["End build_delete_stmt"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
String::with_capacitycalls in the D1 SQL builders use hard-coded constants (e.g.,128,20,40); consider deriving these from more explicit per-field size assumptions or centralizing them as named constants to make the intent and future adjustments clearer. - The manual SQL construction in
build_upsert_stmtandbuild_delete_stmtis more complex than the previousformat!/joinapproach; adding small helper functions (e.g., to append column lists or placeholders) or brief comments around the loop logic would improve readability and reduce the chance of subtle formatting mistakes in future edits.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `String::with_capacity` calls in the D1 SQL builders use hard-coded constants (e.g., `128`, `20`, `40`); consider deriving these from more explicit per-field size assumptions or centralizing them as named constants to make the intent and future adjustments clearer.
- The manual SQL construction in `build_upsert_stmt` and `build_delete_stmt` is more complex than the previous `format!/join` approach; adding small helper functions (e.g., to append column lists or placeholders) or brief comments around the loop logic would improve readability and reduce the chance of subtle formatting mistakes in future edits.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR optimizes SQL statement generation for the Cloudflare D1 export target by replacing intermediate Vec<String> + join() patterns with direct writes into pre-sized String buffers, reducing per-statement heap allocations in hot mutation paths. It also includes small formatting/lint cleanups in the rule-engine and ast-engine crates.
Changes:
- Reworked
D1ExportContext::{build_upsert_stmt, build_delete_stmt}to build SQL viaString::with_capacity,push_str, andstd::fmt::Write. - Removed an unnecessary explicit lifetime usage in
rule-engineand applied minor formatting cleanups. - Minor formatting refactors in tree-sitter editing logic and related tests.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/flow/src/targets/d1.rs | Replaces vector-based SQL assembly with single-buffer SQL construction to reduce allocations. |
| crates/rule-engine/src/rule/referent_rule.rs | Minor refactor/formatting of RwLock read/clone chain (lint cleanup). |
| crates/rule-engine/src/rule/mod.rs | Reformats defined_vars collection pipeline for readability. |
| crates/rule-engine/src/check_var.rs | Removes explicit lifetimes where elision suffices (clippy cleanup). |
| crates/ast-engine/src/tree_sitter/mod.rs | Small formatting-only refactors in UTF-8 handling and a test assertion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sql.push_str(") ON CONFLICT DO UPDATE SET "); | ||
|
|
||
| let mut first_update = true; | ||
| for (idx, _value) in values.fields.iter().enumerate() { | ||
| if let Some(value_field) = self.value_fields_schema.get(idx) { |
| write!(&mut sql, "DELETE FROM {} WHERE ", self.table_name) | ||
| .map_err(|e| RecocoError::internal_msg(format!("Failed to format SQL: {}", e)))?; | ||
|
|
||
| let mut first = true; | ||
| for (idx, _key_field) in self.key_fields_schema.iter().enumerate() { |
💡 What:
Replaced multiple intermediate
Vec<String>allocations (columns,placeholders,update_clauses,where_clauses) with pre-allocatedString::with_capacitybuffers when building the SQL queries inbuild_upsert_stmtandbuild_delete_stmtin the D1 export context. Usesstd::fmt::Writeand direct.push_str()logic rather than string arrays aggregated by.join(", "). Fixed an unrelated clippy explicit lifetime warning inrule-engine.🎯 Why:
The original approach resulted in allocating, pushing to, joining, and deallocating multiple vectors and intermediary strings inside critical database mutation preparation functions. Avoiding these redundant memory allocations improves overall efficiency inside tight mutation loops during data batching workloads, alleviating heap contention.
📊 Impact:
Significantly reduces heap allocations and intermediate string copying, cutting out O(n) memory allocations completely per record mutation batch generated in the thread-flow D1 logic, saving multiple nanoseconds to microseconds of garbage accumulation per SQL generation operation.
🔬 Measurement:
Verified via
cargo check -p thread-flowand ensuring all relevant test suites (cargo test -p thread-flow --test d1_target_testsandd1_minimal_tests) pass successfully, preserving identical SQL payload formats while avoiding large memory spikes.PR created automatically by Jules for task 17653822471632735717 started by @bashandbone
Summary by Sourcery
Optimize SQL generation for D1 exports by eliminating intermediate string allocations and clean up minor clippy and formatting issues across AST and rule-engine modules.
Enhancements: