When OR Just Won't Do: Using UNION for Higher SQL Performance
Copyright © 2024 PawSQL
Problem Definition
If two query conditions are connected by OR
, the database optimizer may not be able to use indexes to execute the query. For example, for the following SQL statement:
select * from customer where c_phone like '139%' or c_name = 'Ray'
Even if both fields have indexes, they can be rewritten as a UNION
query to leverage the indexes and improve query performance:
select * from customer where c_phone like '139%'
union
select * from customer where c_name = 'Ray'
However, this transformation does not always improve performance. It requires certain conditions and cost-based estimation.
If the database supports
INDEX MERGING
, you can also enable theINDEX MERGING
optimization strategy by adjusting database parameters to improve database performance.
Applicable Conditions
- The OR-connected conditions must be indexable.
- The estimated cost of the rewritten
UNION
statement must be lower than the original SQL. - If the
OR
branch conditions are mutually exclusive, rewrite them asUNION ALL
instead ofUNION
.
Case Analysis
Case 1. Conditions cannot utilize indexes, no rewrite optimization
select * from customer
where c_phone = '1' or c_phone like '%139%'
Analysis: One condition c_phone like '%139%'
cannot use indexes. Even after rewriting, a full table scan is still required. PawSQL does not rewrite.
Case 2. Filter condition selectivity is low enough, no rewrite optimization
select * from customer
where custkey = 1 and (c_phone = '1' or c_phone like '%139%')
Analysis: custkey
is the primary key. custkey = 1
uniquely locates one record, so no need to rewrite.
Case 3. Conditions met, use rewrite optimization
select distinct * from customer
where c_phone like '139%' or c_name = 'Ray'
Analysis: Both conditions can use indexes and the selectivity is less than 10%. Can be rewritten. distinct
can be removed after rewriting by union
.
select * from customer where c_phone = '1'
union
select * from customer where c_phone like '139%'
Case 4. OR
condition branches are mutually exclusive, rewrite to UNION ALL
select * from customer where custkey = 1 or (custkey = 2 and c_phone like '139%')
Analysis: The two condition branches custkey = 2 and c_phone like '139%'
and custkey = 1
are mutually exclusive. Rewrite to UNION ALL
:
select * from customer where c_custkey = 2 and c_phone like '139%'
union all
select * from customer where c_custkey = 1
Case 5. Contains order by
and limit
, use rewrite optimization
select * from orders o
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
order by O_ORDERDATE desc limit 10
Analysis: Although the selectivity of O_ORDERPRIORITY = 1
and O_SHIPPRIORITY = 1
is high, the total cost is lower by leveraging indexes to avoid sorting. PawSQL rewrites the optimization.
select *
from (
(select /*QB_2*/ *
from orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_SHIPPRIORITY = 1
order by o.O_ORDERDATE desc limit 10)
union
(select /*QB_1*/ *
from orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_ORDERPRIORITY = '1'
order by o.O_ORDERDATE desc limit 10
)
) as PawDT_1702555889039
order by PawDT_1702555889039.O_ORDERDATE desc limit 10
Case 6. Contains grouping and aggregates, use rewrite optimization
select O_ORDERDATE, count(1) from orders o
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
group by O_ORDERDATE
Analysis: Although the selectivity of O_ORDERPRIORITY = 1
and O_SHIPPRIORITY = 1
is high, the total cost is lower by leveraging indexes to avoid sorting. PawSQL rewrites the optimization.
select PawDT_1702884016144.O_ORDERDATE, count(1)
from (
select /*QB_2*/ o.O_ORDERDATE, o.O_ORDERKEY
from tpch_pkfk.orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_SHIPPRIORITY = 1
union
select /*QB_1*/ o.O_ORDERDATE, o.O_ORDERKEY
from tpch_pkfk.orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_ORDERPRIORITY = '1'
) as PawDT_1702884016144
group by PawDT_1702884016144.O_ORDERDATE
Performance Validation
Case 5. 900x performance improvement
- Before optimization (execution time 432.322ms)
- After optimization (execution time 0.189ms)
Case 6. 15x performance improvement
- Before optimization (2.816ms)
- After optimization (0.204 ms)
🌐 About PawSQL
PawSQL is dedicated to automatic and intelligent database performance optimization. The products provided by PawSQL include:
- PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
- PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
Contact Us
Email: service@pawsql.com
Website: https://www.pawsql.com