Home United States USA — software Cool SQL Optimizations That Do Not Depend on the Cost Model (Part...

Cool SQL Optimizations That Do Not Depend on the Cost Model (Part 2)

October 1, 2017

265

Learn about predicate merging, provably empty sets, CHECK constraints, unneeded self JOINs, and predicate pushdown as ways to optimize SQL.
Be sure to check out Part 1 here first!
This one is interesting and has bitten me in the past when I erroneously assumed that a given database, I could do it.
Consider the following query:
Obviously, the two predicates overlap and can be merged. I would expect the database to transform the above into:
Looks obvious, right? It is a more sophisticated case of transitive closure. Which database can do it?
Yes!
Again, unfortunately, MySQL doesn’t display the predicate information very nicely. We get the same plan for both queries:
2x the same cardinalities, 2x Using where with no indication what exactly is being done inside of where, but given the cardinality, we can assume that the transformation happened correctly. We can look at it differently; let’s try this query:
Which should be transformed into this one:
And indeed, it happens:
Observe how TYPE=range changed to TYPE=const.
So, we can conclude that yes, MySQL implements this optimization.
Yes!
The predicate that is being applied only includes the values 2 and 3, so the transformation has worked out correctly.
Regrettably, no — this is not optimized!
Both predicates are still present in the execution plan, and the cardinality estimate is wrong; it should be 2, not 1. If I manually transform the query, I’m getting this plan instead:
In particular, we can see the wrong plan if the two predicates do not overlap, in case of which, an impossible predicate is formed:
Still, this yields a “wrong” plan:
Bummer!
Yes, this works:
This one is really cool. We’ve seen Impossible predicates and unneeded table accesses before. What if we do this again, but this time with a JOIN? Can JOIN elimination kick in, too?
We’re trying the following queries.
The predicate in the WHERE clause cannot be TRUE because we have a NOT NULL constraint on the FILM_ID column.
The derived table FA cannot return any rows because of that NOT NULL constraint on the FA. FILM_ID column, so it is provably empty. Because an INNER JOIN with an empty table cannot produce any rows, either, this should save us from accessing the ACTOR table, so the above query should be rewritten to something like this:
The predicate is never evaluated and the JOIN is eliminated.
In principle, this is the same as the previous example, but using a bit more sophisticated syntax:
Because of the NOT NULL constraints on both FA. ACTOR_ID and FA. FILM_ID, an INTERSECT operation with a (NULL, NULL) tuple should not yield any results, and thus the derived table is provably empty, and thus the INNER JOIN can be eliminated.
Funky, but why not?
Finally, let’s repeat the above type of query, but this time with a SEMI JOIN instead of an INNER JOIN. First with an impossible predicate…
…then again with an intersection.
Let’s go. Which database can do which optimization?
Joining a provably empty set ( IS NULL predicate):
Joining a provably empty set ( INTERSECT):
Semi joining a provably empty set ( IS NULL predicate):
Semi-joining a provably empty set ( INTERSECT):
Wow, cool! Looks like a winner!
Joining a provably empty set ( IS NULL predicate):
Cool! I didn’t expect this!
MySQL doesn’t support INTERSECT, regrettably.
Semi-joining a provably empty set ( IS NULL predicate):
That’s still a great result for MySQL!
Joining a provably empty set ( IS NULL predicate):
Again, a very confusing execution plan in Oracle, but the NULL IS NOT NULL filter is there, and it happens before all the other operations, which are not executed.
Joining a provably empty set ( INTERSECT):
Interesting. This plan will indeed access the entire FILM_ACTOR primary key. It can save accesses to the ACTOR table and primary key index because it does the derived table first (which yields no rows), but still those Ids 5 and 6 should not be there. Bummer!
Semi joining a provably empty set ( IS NULL predicate):
This works again:
…with the same confusing plan that keeps around the unexecuted subtree.
Semi-joining a provably empty set ( INTERSECT):
Again, no optimization here:
Not so good!
Disappointingly, PostgreSQL doesn’t fare well in this experiment!
Joining a provably empty set ( IS NULL predicate):
Nope:
Joining a provably empty set ( INTERSECT):
Even worse:
Semi joining a provably empty set ( IS NULL predicate) — same as inner-join:
Semi-joining a provably empty set ( INTERSECT) — unsurprisingly:
SQL Server shines, like DB2.
Joining a provably empty set ( IS NULL predicate):
Joining a provably empty set ( INTERSECT):
Semi-joining a provably empty set ( IS NULL predicate):
Semi-joining a provably empty set ( INTERSECT):
On a side note, this could be done in thousands of other ways. Feel free to comment with your own ideas on how to create “provably empty sets” to see if this is optimised by any of the databases.
Oh, this is cool! Our Sakila database has a CHECK constraint on the FILM. RATING table:
But there’s also an interesting optimization aspect here! Check out these queries.
We’ve seen impossible predicates before, even with NOT NULL constraints (which are special types of CHECK constraints, in fact), but this is even more powerful:
There can be no such film because the CHECK constraint prevents its insertion (or update). This should again be transformed into a NOOP. Now, what about this?
With the above index, we should probably simply run a quick index scan to count all the films of rating = NC-17, because that’s the only remaining rating. So the query should be rewritten to this:
It should be, regardless of the index, because comparing the column with a single value is faster than comparing it with four values.
So, which database can do these things?
Impossible predicate (rating = N/A) — cool!
Inverse predicate (rating = NC-17) — nope…
While the index is used on ID=3 and while the cardinalities are correct, it is scanned entirely, as we do not have a range predicate but a SARG predicate. For more details, see Markus Winand’s overview here.
We can also show this by manually inverting the predicate to get:
Now, we’re getting the desired range predicate.
MySQL supports the CHECK constraint syntax but doesn’t enforce it for whatever reason. Try this:
You’ll get:
Zero points for MySQL (really, why not just support CHECK constraints?)
Impossible predicate (rating = N/A):
Again, the super confusing NULL IS NOT NULL filter that cuts off the FULL TABLE SCAN, which might as well be removed entirely from the plan. But at least it works!
Inverse predicate (rating = ‘NC-17’)… oops:
The predicate could not be inversed; we get a much off cardinality estimate; we get an INDEX FAST FULL SCAN instead of an INDEX RANGE SCAN and a filter predicate rather than an access predicate. Here’s what we should have gotten, i.e. when manually inverting the predicate:
Bummer!
Note that the Sakila database in its PostgreSQL version uses an ENUM type instead of a CHECK constraint on the RATING column. I’ve duplicated the table to use a CHECK constraint instead.
Impossible predicate (rating = N/A) doesn’t work:
Inverse predicate (rating = ‘NC-17’)… also nope:
Too bad!
Impossible predicate (rating = N/A) — yes!
Inverse predicate (rating = NC-17) — also yes!
When your queries get more complex, it might well happen that you’re going to self JOIN a table based on its primary key. Trust me, this is common practice when you build complex views and JOIN them to each other, so a database noticing this is a crucial step in optimising complex SQL. I won’t show a complex example, but a simple one, for example:
This could be considered a special case of JOIN elimination as we don’t really need the JOIN of A2, we can do everything with A1 only. Now, INNER JOIN elimination normally works in the presence of a FOREIGN KEY only, which we don’t have here. But because of the PRIMARY KEY on ACTOR_ID, we can prove that in fact A1 = A2. In a way, this is transitive closure all over again.
We can take this one step further and use columns from both A1 and A2:
In the classic JOIN elimination case, we could no longer eliminate the JOIN because we’re projecting from both tables.