Skip to content

HHH-8370 SQL Server IN-list Performance Improvement #10335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

robwgreenjr
Copy link

Adds support to SQL Server 2008 and above for more performant IN-list queries.

Current behavior generates queries like:

WHERE ((T.FIELD1=? AND T.FIELD2=?) OR (T.FIELD1=? AND T.FIELD2=?) OR ... (T.FIELD1=? AND T.FIELD2=?))

and a more performant query would be:

WHERE EXISTS (
  SELECT 1 FROM (VALUES (?,?), (?,?), (?,?)) AS V(FIELD1, FIELD2)
  WHERE T.FIELD1 = V.FIELD1 AND T.FIELD2 = V.FIELD2
)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license
and can be relicensed under the terms of the LGPL v2.1 license in the future at the maintainers' discretion.
For more information on licensing, please check here.


https://hibernate.atlassian.net/browse/HHH-8370

Copy link
Member

@mbellade mbellade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @robwgreenjr, left a first round of comments.

Copy link
Member

@mbellade mbellade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, left a couple more comments.

I'm wondering though 2 things: the Jira issue actually refers to a different syntax being used for Oracle database, have you tried checking what that is? Also, have you tested in any way if performance is actually better than the existing emulation?

Comment on lines +7651 to +7659
for ( int i = 0; i < listExpressions.size(); i++ ) {
if ( i > 0 ) {
appendSql( ", " );
}
appendSql( OPEN_PARENTHESIS );
renderCommaSeparatedSelectExpression( List.of( listExpressions.get( i ) ) );
appendSql( CLOSE_PARENTHESIS );
}
appendSql( ") as v(" );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but why can't we just call renderCommaSeparatedSelectExpression( listExpressions )?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow for separating potential composite values on their columns, creation of:
SELECT * FROM (VALUES (?,?), (?,?)) AS V(FIELD1, FIELD2)
instead of
SELECT * FROM (VALUES (?,?,?,?), (?,?,?,?)) AS V(FIELD1, FIELD2)

@@ -1263,4 +1263,8 @@ public boolean supportsRowValueConstructorSyntaxInInList() {
return false;
}

@Override
public boolean supportsValuesListForInListExistsEmulation() {
return getVersion().isSameOrAfter( 10 );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimum supported version is already 11.x (SQL Server 2012), so no need for this I think.

@@ -7642,6 +7642,41 @@ public void visitInListPredicate(InListPredicate inListPredicate) {
getSqlTuple( listExpression )
.getExpressions().get( 0 );
}
else if ( dialect.supportsValuesListForInListExistsEmulation() ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only useful for SQL Server specifically, I would actually put this in SQLServerSqlAstTranslator by overriding the visitInListPredicate method, without needing to add a new flag to Dialect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will move. I do know this syntax also works with Postgres but is it useful for performance? I don't know.

@robwgreenjr
Copy link
Author

robwgreenjr commented Jun 19, 2025

Thank you, left a couple more comments.

I'm wondering though 2 things: the Jira issue actually refers to a different syntax being used for Oracle database, have you tried checking what that is? Also, have you tested in any way if performance is actually better than the existing emulation?

No problem, thanks for the comments!

Yeah I did look into the Oracle implementation first, and I believe the ticket is talking about this part of AbstractSqlAstTranslator:

// Some DBs like Oracle support tuples only for the IN subquery predicate
if ( dialect.supportsRowValueConstructorSyntaxInInSubQuery() && dialect.supportsUnionAll() ) {
	...
	appendSql( " in (" );
	String separator = NO_SEPARATOR;
	for ( Expression expression : listExpressions ) {
		appendSql( separator );
		renderExpressionsAsSubquery(
				getSqlTuple( expression ).getExpressions()
		);
		separator = " union all ";
	}
	appendSql( CLOSE_PARENTHESIS );
}

and this syntax doesn't work in SQL Server.

There was some performance testing done in HSEARCH-1367 but it's a bit old. I was considering adding some performance test results anyway, so I'll add some updated performance numbers along with trying different SQL Server versions.

@robwgreenjr
Copy link
Author

Closing this as it doesn't look to be needed with supported SQL Server versions.

Running performance test with problematic query, using ~900k rows, does not reproduce the same poor performance.

@robwgreenjr robwgreenjr closed this Jul 4, 2025
@beikov
Copy link
Member

beikov commented Jul 7, 2025

Thanks for confirming. I'm also going to close the Jira issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants