Skip to content

Query graph for Join reordering#22050

Open
JanKaul wants to merge 7 commits intoapache:mainfrom
Embucket:query_graph
Open

Query graph for Join reordering#22050
JanKaul wants to merge 7 commits intoapache:mainfrom
Embucket:query_graph

Conversation

@JanKaul
Copy link
Copy Markdown
Contributor

@JanKaul JanKaul commented May 6, 2026

Which issue does this PR close?

Rationale for this change

Join-ordering algorithms (DPhyp, DPccp, …) operate on a graph view of the join region rather than a LogicalPlan tree. DataFusion has no such structure today, so any future reordering rule has to re-derive one. This PR adds the data structure and the LogicalPlan ⇄ JoinGraph boundary so the follow-up enumeration work in epic #18249 has something concrete to build on.

What changes are included in this PR?

New datafusion/optimizer/src/reorder_join/join_graph.rs:

  • JoinGraph, Node, Edge with NodeId / EdgeId handles backed by an internal VecMap (stable indices, no reuse on removal).
  • JoinGraph::try_from_logical_plan(plan) -> Result<(JoinGraph, Vec<LogicalPlan>)>:
    • strips wrapper operators above the topmost join and returns them so the caller can reapply them after reordering;
    • decomposes inner joins into nodes (leaf relations) and edges (equi-join predicates);
    • hoists non-equi predicates — both Join.filter and Filter nodes sitting between inner joins — into a side-channel filters list;
    • treats non-inner joins and other operators nested between joins (Aggregate, Projection, …) as opaque leaves.
  • reconstruct_plan(join_plan, wrappers) re-applies the stripped wrappers after reordering.
  • Mutation API for the future enumerator: add_node, add_node_with_edge, remove_node, remove_edge,
    Node::neighbours, Node::connections.
  • Module exported from datafusion/optimizer/src/lib.rs.

No optimizer rule is registered; nothing consumes JoinGraph outside tests.

Are these changes tested?

Yes — unit tests in join_graph.rs:

  • three-way inner join with a non-equi Join.filter (predicate lands in side-channel);
  • Filter between two inner joins (hoisted; both joins still decompose);
  • Aggregate between two inner joins (opaque leaf);
  • LEFT join nested inside an inner chain (opaque leaf);
  • top-level non-inner join (single opaque leaf).

No sqllogictest changes — no planner-visible behavior yet.

Are there any user-facing changes?

No. JoinGraph is a new internal data structure in datafusion-optimizer; no existing API changes and no rule consumes it yet.

@github-actions github-actions Bot added the optimizer Optimizer rules label May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant