Skip to main content

Query Folding

Copyright © 2023 PawSQL

Definition

By enabling Query Folding rule, PawSQL merges view references, derived tables and common table expressions to outer query blocks to:

  • avoid materializing intermediate result-set
  • enable more join order planning
  • enable more opportunities for index recommendation

We will use view in following paragraph, but all the descriptions also apply to derived tables and common table expressions.

Example

  • original query
SELECT * FROM (SELECT c_custkey, c_name FROM customer) AS derived_t1;
  • rewritten query
SELECT c_custkey, c_name FROM customer

Conditions

PawSQL rewrite engine supports two types of query folding strategies,

Type 1

  • In view itself, there is no distinct or aggregation.
  • In outer query block, view being folded is not part of an outer join.

Type 2

  • In outer query block, the view is the only table reference
  • In outer query block, there is no group by clause and no aggregate functions.

Examples

Type 1 Query Folding

  • Original query
with dt as 
(
SELECT o_custkey, O_TOTALPRICE
FROM orders
where o_shippriority=0
)

select c.c_name, sum(O_TOTALPRICE) price
FROM customer c, dt
WHERE c.c_custkey = dt.o_custkey
group by c.c_name
  • Query plan of original query
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Nested loop inner join (cost=3806388.23 rows=3400871)
-> Table scan on dt (cost=0.01..2494.72 rows=199378)
-> Materialize CTE dt (cost=62935.41..65430.12 rows=199378)
-> Table scan on <temporary> (cost=0.01..2494.72 rows=199378)
-> Temporary table with deduplication (cost=40502.89..42997.60 rows=199378)
-> Table scan on orders (cost=20565.07 rows=199378)
-> Index lookup on c using PAW_IDX1614428511 (C_CUSTKEY=dt.o_custkey) (cost=17.06 rows=17)
  • Optimized query
select c.c_name, sum(o_totalprice) as price
FROM customer c, orders
WHERE c.c_custkey = o_custkey and o_shippriority=0
group by c.c_name
  • Query plan of optimized query
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Nested loop inner join (cost=394660.89 rows=340087)
-> Filter: (orders.O_SHIPPRIORITY = 0) (cost=20565.07 rows=19938)
-> Table scan on orders (cost=20565.07 rows=199378)
-> Index lookup on c using PAW_IDX1614428511 (C_CUSTKEY=orders.O_CUSTKEY) (cost=17.06 rows=17)

Optimization Explanation: We can see there is a Materialization step in the query plan of original query, which results from a huge cost. After the rewrite, the materialization step is eliminated and the estimated cost is about 10% of the orignal cost.

Type 2 Query Folding

  • Original query
with dt as
(select c.c_name, sum(O_TOTALPRICE) price
FROM customer c, orders
WHERE c.c_custkey = orders.o_custkey
group by c.c_name)
select dt.price
from dt
where dt.c_name like '139%';

  • Query plan of original query
-> Table scan on dt  (cost=2.50..2.50 rows=0)
-> Materialize CTE dt (cost=2.50..2.50 rows=0)
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Nested loop inner join (cost=3761523.18 rows=377837)
-> Table scan on orders (cost=20565.07 rows=199378)
-> Filter: (c.C_NAME like '139%') (cost=17.06 rows=2)
-> Index lookup on c using PAW_IDX1614428511 (C_CUSTKEY=orders.O_CUSTKEY) (cost=17.06 rows=17)
  • Optimized query
select sum(orders.O_TOTALPRICE)
from customer as c, orders
where c.c_custkey = orders.o_custkey
group by c.c_name
having c.c_name like '139%'
  • Query plan of optimized query
-> Filter: (c.C_NAME like '139%')
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Nested loop inner join (cost=3761523.18 rows=3400871)
-> Table scan on orders (cost=20565.07 rows=199378)
-> Index lookup on c using PAW_IDX1614428511 (C_CUSTKEY=orders.O_CUSTKEY) (cost=17.06 rows=17)

Optimization Explanation: Like type 1, we can see the materialization step is eliminated after the rewrite optimization.

Optimization Enablement

View folding will reduce blocks by combining conditions or introducing table joins, which will enable other optimizations in PawSQL and optimizations in database engine:

  • Optimizations enabled in PawSQL
    • more efficient index recommendation
  • Optimizations enabled in Database engine
    • avoiding materializing intermediate resultset
    • more join order planning
    • index usage

Let's take the type 1 case as an example, PawSQL Index Advisor engine takes the rewritten query as input and recommended a following index.

CREATE INDEX PAW_IDX1564931998 ON TPCH.ORDERS(O_SHIPPRIORITY,O_CUSTKEY,O_TOTALPRICE);

And the final query plan is as follows.

 -> Table scan on <temporary>
-> Aggregate using temporary table
-> Nested loop inner join (cost=95046.40 rows=85935)
-> Covering index lookup on orders using PAW_IDX1564931998 (O_SHIPPRIORITY=0) (cost=517.68 rows=5038)
-> Index lookup on c using PAW_IDX1614428511 (C_CUSTKEY=orders.O_CUSTKEY) (cost=17.06 rows=17)

In the query plan, we can see we eliminates a table scan, and the estimated cost is further reduced to 95046.40, which is about 2.5% of the original cost 3806388.23.

About PawSQL

PawSQL is dedicated to automatic and intelligent database performance optimization. The products provided by PawSQL include:

  • PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
  • PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
  • PawSQL Engine, which is the backend optimization engine of the PawSQL series of products, can be installed and deployed independently, and provides SQL optimization services through http/json interfaces. PawSQL Engine is provided for deployment and installation as a docker image.

Contact Us

Email: service@pawsql.com

Twitter: https://twitter.com/pawsql