Commit aadbc49
Version 3.0.0 Proposal draft with sqlglot as the parsing library (#617)
* wip - working version with sqlglot to refactor
* wip - extract bracketed names into new dialect
* wip - Rewrite REPLACE INTO → INSERT INTO in _ast.py._parse() before sqlglot parses it, so sqlglot produces a proper exp.Insert AST instead of exp.Command and parses it correctly without falling back to regex
* wip - Rewrite REPLACE INTO → INSERT INTO in _ast.py._parse() before sqlglot parses it, so sqlglot produces a proper exp.Insert AST instead of exp.Command and parses it correctly without falling back to regex
* add docstings, refactor to simplify most complex methods, add few tests from open issues to verify if it's handling the issues better than the old version, remove internal tokens and produce only list of strings if needed, remove compatibility layer to v1
* add tests from open issues that now passes and some small fixes to accommodate additional 3 tests
* accept capitalization and explicit as from sqlglot as opinionated defaults to simplify the bodies extraction
* simplify logic, refactor into classes with related functionalities
* additional simplification and cleanup
* remove unnecessary wrappers
* further simplification - add also architecture overview with charts and main notes
* next portion of cleanup, renaming files, update also agents.md file, switch to ruff for formating and linting
* refactor other functionalities from ast parser into separate classes
* change to ruff also in CI. add mypy and fix typing errors, add mypy to CI
* fix remaining mypy errors in untyped code
* further fixes and duplication cleanup
* fix unused code, bump coverage - add todo to revisit corner cases later for now mark nocover as unreachable from parser and this is the only entrypoint we want for majority of the tests
* add features to handle unnamed queries, extracting properly hive tables from queries with subscripts, some additional issues that were already fixed were documented by tests, some cleanup and refactor to decrease unreachable paths
* fix mypy - add additional test for next already solved issue
* add additional test for next already solved issue
* remove unreachable stars without table node handling - it's either raw star or star with table when prefixed with table name/alias - unreachable code
* raise more meaningful error on invalid queries, raise on cte without name instead of silently skipping, extract mypy and ruff into separate workflows
* reorder methods, refactor complicated conditions into helper methods, add more descriptive docstrings and add sample queries in majority of the code flow branches to easier navigate the code
* handle redshift append clause with custom dialect, clean up table extractor and add more descriptive docstrings
* fix typing to go with 3.10 flow not deprecated typing ones. add support for nested ctes, cleanup nested resolver and simplify code there. remove unnecessary guards
* additional cleanup in dialect_parser.py and query_type_extractor.py
* cleanup in parser and docstring refactor, cleanup of thr remaining todoes
* fixes after the code review
* additional changes and optimizations after initial review
* break internal cyclical imports, change test to use parser instead of internal method
* update docs and readme, additional cleanup when possible -> use native sqlglot features whenever possible
* minor cleanup
---------
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>1 parent d3a03ac commit aadbc49
48 files changed
Lines changed: 6382 additions & 2644 deletions
File tree
- .github/workflows
- sql_metadata
- test
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | | - | |
28 | | - | |
| 26 | + | |
29 | 27 | | |
30 | 28 | | |
31 | 29 | | |
| |||
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | 45 | | |
49 | 46 | | |
50 | 47 | | |
| |||
77 | 74 | | |
78 | 75 | | |
79 | 76 | | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | 77 | | |
84 | 78 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
0 commit comments