-
Notifications
You must be signed in to change notification settings - Fork 797
feat(query): add rule_grouping_sets_to_union #18413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sundy-li
merged 92 commits into
databendlabs:main
from
sundy-li:rule_grouping_sets_to_union
Jul 28, 2025
Merged
Changes from all commits
Commits
Show all changes
92 commits
Select commit
Hold shift + click to select a range
e880453
add MaterializedCTE plan
SkyFan2002 466e163
build pipeline
SkyFan2002 db16044
build pipeline
SkyFan2002 26bb0ab
add operator
SkyFan2002 7e986a9
remove m cte temp table
SkyFan2002 e4ee842
bind
SkyFan2002 307809c
Merge remote-tracking branch 'upstream/main' into cte_plan
SkyFan2002 e5d7472
fix
SkyFan2002 9a53ba2
remove unused field
SkyFan2002 ff73950
fix bind
SkyFan2002 5197134
fix schema
SkyFan2002 ba5be42
fix
SkyFan2002 7690dbf
make lint
SkyFan2002 1f22235
Merge branch 'main' into cte_plan
SkyFan2002 f8f4d7a
fix
SkyFan2002 4c75ded
fix join
SkyFan2002 5bb786c
Merge branch 'main' into cte_plan
SkyFan2002 5a4c0ca
Merge branch 'main' into cte_plan
SkyFan2002 cc89312
fix
SkyFan2002 291204a
refine explain
SkyFan2002 4835fb8
fix
SkyFan2002 046bd04
fix
SkyFan2002 4979e0b
fix
SkyFan2002 af0eeb7
fix
SkyFan2002 67c2bc3
fix
SkyFan2002 9a8eb3b
fix
SkyFan2002 30aa9f3
fix
SkyFan2002 e39af8e
fix
SkyFan2002 7b5b406
Merge branch 'main' into cte_plan
SkyFan2002 a39a569
CleanupUnusedCTE
SkyFan2002 3686686
fix
SkyFan2002 fb7bfbd
fix
SkyFan2002 e600f31
fix
SkyFan2002 5e0752a
fix
SkyFan2002 4606b12
refine
SkyFan2002 70df77d
refine
SkyFan2002 e38c70f
make lint
SkyFan2002 6829408
fix
SkyFan2002 bc4efde
add log
SkyFan2002 b5ccca9
fix
SkyFan2002 2f897f4
fix
SkyFan2002 cf7e7f9
make lint
SkyFan2002 2a5f371
fix
SkyFan2002 8b0efe5
fix
SkyFan2002 38067f5
fix
SkyFan2002 3758bbd
fix
SkyFan2002 488a08d
fix
SkyFan2002 f24bb7f
disable distributed optimization
SkyFan2002 6d70dd6
Merge remote-tracking branch 'upstream/main' into cte_plan
SkyFan2002 00b299a
fix merge
SkyFan2002 4d24f11
Merge branch 'main' into cte_plan
SkyFan2002 ad405bb
fix explain join
SkyFan2002 a5853a0
fix logic test
SkyFan2002 0d8ee4e
fix logic test
SkyFan2002 321a70a
add ref count
SkyFan2002 a5949c1
refactor: streaming CTE consumption
SkyFan2002 9c146f8
refactor plan
SkyFan2002 993659a
fix
SkyFan2002 db1c089
fix
SkyFan2002 3e90fbd
Merge branch 'main' into cte_plan
SkyFan2002 082eccf
enable distributed
SkyFan2002 2ad7a25
fix logic test
SkyFan2002 b2d42af
fix serial cte
SkyFan2002 5e20f1c
fix test
SkyFan2002 c612c56
fix fragment type
SkyFan2002 5fc45ba
fix replace range join
SkyFan2002 7b38411
fix explain join order
SkyFan2002 e385732
fix logic test
SkyFan2002 4fcfba5
feat(query): add rule_grouping_sets_to_union
sundy-li c7b9d95
feat(query): add rule_grouping_sets_to_union
sundy-li dedc246
Merge remote-tracking branch 'fb/cte_plan' into rule_grouping_sets_to…
sundy-li 1f8df6c
simplify
SkyFan2002 a1178bf
ref_count calculation is not required when constructing MaterializedCTE
SkyFan2002 934a2d4
Merge fb
sundy-li 9a91ebb
simplify MaterializedCTE
SkyFan2002 b79eb41
simplify CTEConsumer
SkyFan2002 1740ff0
Merge remote-tracking branch 'fb/cte_plan' into rule_grouping_sets_to…
sundy-li e38ca13
Merge
sundy-li 576f98a
Update src/query/service/src/pipelines/builders/builder_sequence.rs
SkyFan2002 df268f7
Update src/query/service/src/pipelines/builders/builder_materialized_…
SkyFan2002 a053834
make lint
SkyFan2002 0a01f28
rename CTEConsumer to MaterializeCTERef
SkyFan2002 4513b15
Update src/query/sql/src/planner/optimizer/optimizers/operator/cte/cl…
SkyFan2002 5f951f0
Merge
sundy-li 2c6399a
Merge
sundy-li 8ff3944
add channel size config
SkyFan2002 acc747d
Merge remote-tracking branch 'fb/cte_plan' into rule_grouping_sets_to…
sundy-li c2eacfa
Merge
sundy-li a71c994
Merge
sundy-li e9986a1
Merge branch 'main' into rule_grouping_sets_to_union
sundy-li da58283
update
sundy-li 16bb077
update
sundy-li File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
258 changes: 258 additions & 0 deletions
258
src/query/sql/src/planner/optimizer/optimizers/rule/agg_rules/rule_grouping_sets_to_union.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,258 @@ | ||
// Copyright 2021 Datafuse Labs | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
use std::hash::DefaultHasher; | ||
use std::hash::Hash; | ||
use std::hash::Hasher; | ||
use std::sync::Arc; | ||
|
||
use databend_common_exception::Result; | ||
use databend_common_expression::types::NumberScalar; | ||
use databend_common_expression::Scalar; | ||
|
||
use crate::optimizer::ir::Matcher; | ||
use crate::optimizer::ir::RelExpr; | ||
use crate::optimizer::ir::SExpr; | ||
use crate::optimizer::optimizers::rule::Rule; | ||
use crate::optimizer::optimizers::rule::RuleID; | ||
use crate::optimizer::optimizers::rule::TransformResult; | ||
use crate::plans::walk_expr_mut; | ||
use crate::plans::Aggregate; | ||
use crate::plans::AggregateMode; | ||
use crate::plans::CastExpr; | ||
use crate::plans::ConstantExpr; | ||
use crate::plans::EvalScalar; | ||
use crate::plans::MaterializeCTERef; | ||
use crate::plans::MaterializedCTE; | ||
use crate::plans::RelOp; | ||
use crate::plans::Sequence; | ||
use crate::plans::UnionAll; | ||
use crate::plans::VisitorMut; | ||
use crate::IndexType; | ||
use crate::ScalarExpr; | ||
|
||
// TODO | ||
const ID: RuleID = RuleID::GroupingSetsToUnion; | ||
// Split `Grouping Sets` into `Union All` of `Group by` | ||
// Eg: | ||
// select number % 10 AS a, number % 3 AS b, number % 4 AS c | ||
// from numbers(100000000) | ||
// group by grouping sets((a,b),(a,c)); | ||
|
||
// INTO: | ||
|
||
// select number % 10 AS a, number % 3 AS b, number % 4 AS c | ||
// from numbers(100000000) | ||
// group by a,b | ||
// union all | ||
// select number % 10 AS a, number % 3 AS b, number % 4 AS c | ||
// from numbers(100000000) | ||
// group by a,c | ||
// | ||
pub struct RuleGroupingSetsToUnion { | ||
id: RuleID, | ||
matchers: Vec<Matcher>, | ||
} | ||
|
||
impl RuleGroupingSetsToUnion { | ||
pub fn new() -> Self { | ||
Self { | ||
id: ID, | ||
// Aggregate | ||
// \ | ||
// * | ||
matchers: vec![Matcher::MatchOp { | ||
op_type: RelOp::EvalScalar, | ||
children: vec![Matcher::MatchOp { | ||
op_type: RelOp::Aggregate, | ||
children: vec![Matcher::Leaf], | ||
}], | ||
}], | ||
} | ||
} | ||
} | ||
|
||
// Must go before `RuleSplitAggregate` | ||
impl Rule for RuleGroupingSetsToUnion { | ||
fn id(&self) -> RuleID { | ||
self.id | ||
} | ||
|
||
fn apply(&self, s_expr: &SExpr, state: &mut TransformResult) -> Result<()> { | ||
let eval_scalar: EvalScalar = s_expr.plan().clone().try_into()?; | ||
let agg: Aggregate = s_expr.child(0)?.plan().clone().try_into()?; | ||
if agg.mode != AggregateMode::Initial { | ||
return Ok(()); | ||
} | ||
|
||
let agg_input = s_expr.child(0)?.child(0)?; | ||
let agg_input_columns: Vec<IndexType> = RelExpr::with_s_expr(agg_input) | ||
.derive_relational_prop()? | ||
.output_columns | ||
.iter() | ||
.cloned() | ||
.collect(); | ||
|
||
if let Some(grouping_sets) = &agg.grouping_sets { | ||
if !grouping_sets.sets.is_empty() { | ||
let mut children = Vec::with_capacity(grouping_sets.sets.len()); | ||
|
||
let mut hasher = DefaultHasher::new(); | ||
agg.grouping_sets.hash(&mut hasher); | ||
let hash = hasher.finish(); | ||
let temp_cte_name = format!("cte_groupingsets_{hash}"); | ||
|
||
let cte_materialized_sexpr = SExpr::create_unary( | ||
MaterializedCTE::new(temp_cte_name.clone(), None, Some(1)), | ||
agg_input.clone(), | ||
); | ||
|
||
let cte_consumer = SExpr::create_leaf(MaterializeCTERef { | ||
cte_name: temp_cte_name, | ||
output_columns: agg_input_columns.clone(), | ||
def: agg_input.clone(), | ||
}); | ||
|
||
let mask = (1 << grouping_sets.dup_group_items.len()) - 1; | ||
let group_bys = agg | ||
.group_items | ||
.iter() | ||
.map(|i| { | ||
agg_input_columns | ||
.iter() | ||
.position(|t| *t == i.index) | ||
.unwrap() | ||
}) | ||
.collect::<Vec<_>>(); | ||
|
||
for set in &grouping_sets.sets { | ||
let mut id = 0; | ||
|
||
// For element in `group_bys`, | ||
// if it is in current grouping set: set 0, else: set 1. (1 represents it will be NULL in grouping) | ||
// Example: GROUP BY GROUPING SETS ((a, b), (a), (b), ()) | ||
// group_bys: [a, b] | ||
// grouping_sets: [[0, 1], [0], [1], []] | ||
// grouping_ids: 00, 01, 10, 11 | ||
|
||
for g in set { | ||
let i = group_bys.iter().position(|t| *t == *g).unwrap(); | ||
id |= 1 << i; | ||
} | ||
let grouping_id = !id & mask; | ||
|
||
let mut eval_scalar = eval_scalar.clone(); | ||
let mut agg = agg.clone(); | ||
agg.grouping_sets = None; | ||
|
||
let null_group_ids: Vec<IndexType> = agg | ||
.group_items | ||
.iter() | ||
.map(|i| i.index) | ||
.filter(|index| !set.contains(index)) | ||
.clone() | ||
.collect(); | ||
|
||
agg.group_items.retain(|x| set.contains(&x.index)); | ||
let group_ids: Vec<IndexType> = | ||
agg.group_items.iter().map(|i| i.index).collect(); | ||
|
||
let mut visitor = ReplaceColumnForGroupingSetsVisitor { | ||
group_indexes: group_ids, | ||
exclude_group_indexes: null_group_ids, | ||
grouping_id_index: grouping_sets.grouping_id_index, | ||
grouping_id_value: grouping_id, | ||
}; | ||
|
||
for scalar in eval_scalar.items.iter_mut() { | ||
visitor.visit(&mut scalar.scalar)?; | ||
} | ||
|
||
let agg_plan = SExpr::create_unary(agg, cte_consumer.clone()); | ||
let eval_plan = SExpr::create_unary(eval_scalar, agg_plan); | ||
children.push(eval_plan); | ||
} | ||
|
||
// fold children into result | ||
let mut result = children.first().unwrap().clone(); | ||
for other in children.into_iter().skip(1) { | ||
let left_outputs: Vec<(IndexType, Option<ScalarExpr>)> = | ||
eval_scalar.items.iter().map(|x| (x.index, None)).collect(); | ||
let right_outputs = left_outputs.clone(); | ||
|
||
let union_plan = UnionAll { | ||
left_outputs, | ||
right_outputs, | ||
cte_scan_names: vec![], | ||
output_indexes: eval_scalar.items.iter().map(|x| x.index).collect(), | ||
}; | ||
result = SExpr::create_binary(Arc::new(union_plan.into()), result, other); | ||
} | ||
result = SExpr::create_binary(Sequence, cte_materialized_sexpr, result); | ||
state.add_result(result); | ||
return Ok(()); | ||
} | ||
} | ||
Ok(()) | ||
} | ||
|
||
fn matchers(&self) -> &[Matcher] { | ||
&self.matchers | ||
} | ||
} | ||
|
||
impl Default for RuleGroupingSetsToUnion { | ||
fn default() -> Self { | ||
Self::new() | ||
} | ||
} | ||
|
||
struct ReplaceColumnForGroupingSetsVisitor { | ||
group_indexes: Vec<IndexType>, | ||
exclude_group_indexes: Vec<IndexType>, | ||
grouping_id_index: IndexType, | ||
grouping_id_value: u32, | ||
} | ||
|
||
impl VisitorMut<'_> for ReplaceColumnForGroupingSetsVisitor { | ||
fn visit(&mut self, expr: &mut ScalarExpr) -> Result<()> { | ||
let old = expr.clone(); | ||
|
||
if let ScalarExpr::BoundColumnRef(col) = expr { | ||
if self.group_indexes.contains(&col.column.index) { | ||
*expr = ScalarExpr::CastExpr(CastExpr { | ||
argument: Box::new(old), | ||
is_try: true, | ||
target_type: Box::new(col.column.data_type.wrap_nullable()), | ||
span: col.span, | ||
}); | ||
} else if self.exclude_group_indexes.contains(&col.column.index) { | ||
*expr = ScalarExpr::TypedConstantExpr( | ||
ConstantExpr { | ||
value: Scalar::Null, | ||
span: col.span, | ||
}, | ||
col.column.data_type.wrap_nullable(), | ||
); | ||
} else if self.grouping_id_index == col.column.index { | ||
*expr = ScalarExpr::ConstantExpr(ConstantExpr { | ||
value: Scalar::Number(NumberScalar::UInt32(self.grouping_id_value)), | ||
span: col.span, | ||
}); | ||
} | ||
return Ok(()); | ||
} | ||
walk_expr_mut(self, expr) | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.