Skip to main content

When OR Just Won't Do: Using UNION for Higher SQL Performance

Copyright © 2023 PawSQL

Problem Definition

If two query conditions are connected by OR, the database optimizer may not be able to use indexes to execute the query. For example, for the following SQL statement:

select * from customer where c_phone like '139%' or c_name = 'Ray' 

Even if both fields have indexes, they can be rewritten as a UNION query to leverage the indexes and improve query performance:

select * from customer where c_phone like '139%'
union
select * from customer where c_name = 'Ray'

However, this transformation does not always improve performance. It requires certain conditions and cost-based estimation.

If the database supports INDEX MERGING, you can also enable the INDEX MERGING optimization strategy by adjusting database parameters to improve database performance.

Applicable Conditions

  1. The OR-connected conditions must be indexable.
  2. The estimated cost of the rewritten UNION statement must be lower than the original SQL.
  3. If the OR branch conditions are mutually exclusive, rewrite them as UNION ALL instead of UNION.

Case Analysis

Case 1. Conditions cannot utilize indexes, no rewrite optimization

select * from customer 
where c_phone = '1' or c_phone like '%139%'

Analysis: One condition c_phone like '%139%' cannot use indexes. Even after rewriting, a full table scan is still required. PawSQL does not rewrite.

Case 2. Filter condition selectivity is low enough, no rewrite optimization

select * from customer 
where custkey = 1 and (c_phone = '1' or c_phone like '%139%')

Analysis: custkey is the primary key. custkey = 1 uniquely locates one record, so no need to rewrite.

Case 3. Conditions met, use rewrite optimization

select distinct * from customer 
where c_phone like '139%' or c_name = 'Ray'

Analysis: Both conditions can use indexes and the selectivity is less than 10%. Can be rewritten. distinct can be removed after rewriting by union.

select * from customer where c_phone = '1'  
union
select * from customer where c_phone like '139%'

Case 4. OR condition branches are mutually exclusive, rewrite to UNION ALL

select * from customer where custkey = 1 or (custkey = 2 and c_phone like '139%')

Analysis: The two condition branches custkey = 2 and c_phone like '139%' and custkey = 1 are mutually exclusive. Rewrite to UNION ALL:

select * from customer where c_custkey = 2 and c_phone like '139%' 
union all
select * from customer where c_custkey = 1

Case 5. Contains order by and limit, use rewrite optimization

select * from orders o  
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
order by O_ORDERDATE desc limit 10

Analysis: Although the selectivity of O_ORDERPRIORITY = 1 and O_SHIPPRIORITY = 1 is high, the total cost is lower by leveraging indexes to avoid sorting. PawSQL rewrites the optimization.

select *
from (
(select /*QB_2*/ *
from orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_SHIPPRIORITY = 1
order by o.O_ORDERDATE desc limit 10)
union
(select /*QB_1*/ *
from orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_ORDERPRIORITY = '1'
order by o.O_ORDERDATE desc limit 10
)
) as PawDT_1702555889039
order by PawDT_1702555889039.O_ORDERDATE desc limit 10

Case 6. Contains grouping and aggregates, use rewrite optimization

select O_ORDERDATE, count(1) from orders o 
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
group by O_ORDERDATE

Analysis: Although the selectivity of O_ORDERPRIORITY = 1 and O_SHIPPRIORITY = 1 is high, the total cost is lower by leveraging indexes to avoid sorting. PawSQL rewrites the optimization.

select PawDT_1702884016144.O_ORDERDATE, count(1) 
from (
select /*QB_2*/ o.O_ORDERDATE, o.O_ORDERKEY
from tpch_pkfk.orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_SHIPPRIORITY = 1
union
select /*QB_1*/ o.O_ORDERDATE, o.O_ORDERKEY
from tpch_pkfk.orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_ORDERPRIORITY = '1'
) as PawDT_1702884016144
group by PawDT_1702884016144.O_ORDERDATE

Performance Validation

Case 5. 900x performance improvement

  • Before optimization (execution time 432.322ms)

image-20231218174425717

  • After optimization (execution time 0.189ms)

image-20231218174523241

Case 6. 15x performance improvement

  • Before optimization (2.816ms)

image-20231218174315435

  • After optimization (0.204 ms)

image-20231218174242606

About PawSQL

PawSQL focuses on automatic and intelligent SQL optimization for databases. It supports MySQL, PostgreSQL, openGauss, Oracle and more. PawSQL products include:

  • PawSQL Cloud, an online automated SQL optimization tool that provides SQL review, intelligent query rewriting, cost-based index recommendations for DBAs and developers.
  • PawSQL Advisor, an IntelliJ plugin for developers, can be installed from IDEA/DataGrip marketplace by searching "PawSQL Advisor".
  • PawSQL Engine, the backend optimization engine of PawSQL products, can be deployed independently and provides SQL optimization services via HTTP/JSON interfaces. PawSQL Engine is provided as a Docker image.

Contact Us