Skip to main content

Query Rewrite

info

PawSQL's rewrite optimization engine offers a variety of SQL rewrite optimizations, recommending semantically equivalent but more efficient SQL statements for execution.

Correctness Rules

1. Rewrite for Subqueries with ALL Qualifier

Rule Description

Assuming the following SQL to get users registered after the order system is closed:

select * from customer where c_regdate > all(select o_orderdate from orders)

If there are NULLs in the result of the subquery, this SQL will always return empty. The correct approach is to add a non-null restriction in the subquery or use the max/min syntax.

select * from customer where c_regdate > (select max(o_custkey) from orders)

PawSQL recommends the second approach, which can further optimize SQL through max/min rewrites. Get more detailed information.

Default Alert Level

  • Prohibited

Conditions

  • Subquery conditions qualified by ALL

2. IN Nullable Subquery May Lead to Unexpected Results

Rule Description

For the following SQL that wants to query users without orders,

select * from customer where c_custkey not in (select o_custkey from orders)

If the result set of the subquery contains null values, this SQL will always return empty. The correct approach is to add a non-null restriction in the subquery, i.e.,

select * from customer where c_custkey not in (select o_custkey from orders where o_custkey is not null)

Default Alert Level

  • Prohibited

Conditions

  • Existence of IN subquery conditions
  • The selected columns in the IN subquery can be NULL

3. NPE Rewrite

Rule Description

The NPE (Null Pointer Exception) issue in SQL refers to the situation where aggregate functions like SUM and AVG return NULL when all columns in the SQL query are NULL, which may lead to a null pointer exception in subsequent programs. For example, for the following SQL:

select sum(t.b) from (values row(1,null)) as t(a,b);

The NPE issue can be avoided by using the following method:

SELECT IFNULL(SUM(t.b), 0) from (values row(1,null)) as t(a,b);

This will return 0 instead of NULL, avoiding a null pointer exception.

Oracle: NVL(); SQL Server and MS Access: ISNULL(); MySQL: IFNULL() or COALESCE();

Default Alert Level

  • Warning

Conditions

  1. SUM or AVG aggregate functions
  2. The arguments of the aggregate functions may all be NULL, including:
    1. The argument is a column, and the column definition can be NULL
    2. The argument is an expression, and the expression can be NULL
    3. The column definition is not NULL, but it is an inner table of an outer join, and the result may be NULL

4. Rewrite =NULL to IS NULL

Rule Description

  1. = null cannot determine if an expression is null; = null is always evaluated as false. To check if an expression is null, is null should be used.

  2. case expr when null also cannot determine if an expression is null; to check if an expression is null, case when expr is null should be used. Incorrect syntax in where/having filter conditions is relatively easy to detect and correct, but using null value judgment in case statements is more difficult to detect.

Default Alert Level

  • Prohibited

Conditions

  • The statement contains = null or case when expr is null logic

Performance Optimization Rules

1. Explicitly Prohibit Sorting

Rule Description

In early versions of MySQL, even without an ORDER BY clause, GROUP BY would default to sorting by the grouping fields, which could lead to unnecessary file sorting and affect SQL query performance. By adding ORDER BY NULL, sorting can be forcibly canceled, disabling the sorting of the query result set; PawSQL recognizes and rewrites it.

For example, in the following case,

SELECT l_orderkey, sum(l_quantity) 
FROM lineitem
GROUP BY l_orderkey;

In MySQL 5.x versions, group by l_orderkey would cause default sorting, which can be avoided by adding ORDER BY NULL.

SELECT l_orderkey, sum(l_quantity) 
FROM lineitem
GROUP BY l_orderkey
ORDER BY NULL;

Default Alert Level

  • Warning

Conditions

  • MySQL database, version lower than 8.0

  • There are grouping fields and no sorting fields

2. COUNT Scalar Subquery Rewrite

Rule Description

For using COUNT scalar subqueries to determine existence, they can be rewritten as EXISTS subqueries to avoid an aggregation operation. For example, for the following SQL,

select * from customer where (select count(*) from orders where c_custkey=o_custkey) > 0

It can be rewritten as,

select * from customer where exists(select 1 from orders where c_custkey=o_custkey)

Rule Description

Databases can leverage the order of indexes to avoid sorting the columns in the ORDER clause, thereby enhancing SQL performance. However, if the ORDER field is an expression or function, it may not be possible to use the index for sorting.

Default Alert Level

  • Notice

Conditions

  • Existence of COUNT scalar subquery > 0 condition

3. Unconditional DELETE Rewritten as TRUNCATE

Rule Description

DELETE statements without query conditions or with always-true conditions will delete all data in the table. DELETE statements require a large number of logs for transaction rollback and master-slave synchronization. For large tables, this may cause database locking and transaction blocking, while also occupying a large amount of log space. If it is confirmed that the data in the table is no longer needed, TRUNCATE can be used to replace DELETE statements. TRUNCATE is faster than DELETE because it does not record each deleted row but directly empties the table and releases space.

delete from lineitem

Rewritten as

truncate lineitem

Default Alert Level

  • Notice

Conditions

  • DELETE statements without conditions or with always-true conditions

4. Implicit Type Conversion Disable Index adoption

Rule Description

When the data types of conditional expressions are different, some implicit data type conversions will occur during the query execution process. Type conversions may apply to constants in the conditions or to columns in the conditions. When type conversions are applied to columns during query execution, indexes cannot be used, which may lead to severe performance issues. For example, for the following SQL,

select count(*) from ORDERS where O_ORDERDATE = current_date();

If the data type of the O_ORDERDATE column is CHAR(16), then the index on O_ORDERDATE will not be used, resulting in a full table scan. The solution usually involves two approaches: one is to ALTER TABLE to change the data type of O_ORDERDATE, and the other is to cast current_date to CHAR type (PawSQL provides this rewrite suggestion).

select count(*) from ORDERS where ORDERS.O_ORDERDATE = cast(current_date() as CHAR(16));

Default Alert Level

  • Warning

Conditions

  • The conditional expression is a filtering condition and is an indexable filtering condition
  • The data types on both sides of the filtering condition are different
  • According to the database's type conversion priority, the database will prioritize converting columns over constants

5. Elimination of DISTINCT in Subqueries

Rule Description

For subqueries that only perform existence tests, if the subquery contains DISTINCT, it can usually be removed to avoid a deduplication operation, for example,

  • IN subquery:
SELECT * FROM customer WHERE c_custkey IN (SELECT DISTINCT o_custkey FROM orders);

It can be simplified to:

SELECT * FROM customer WHERE c_custkey IN (SELECT o_custkey FROM orders);

Default Alert Level

  • Notice

Conditions

  • Using IN/EXISTS subqueries for existence determination
  • The subquery contains DISTINCT/DISTINCT/UNIQUE keywords

6. EXISTS Subqueries to Table Joins Rewrite

Rule Description

EXISTS subqueries return a boolean value to the outer query, indicating whether there are rows that meet the conditions. EXISTS subqueries that meet certain conditions can be converted to JOINs, allowing the database optimizer to have more choices in selecting the driving table, thereby generating a better query plan.

For example, for the following query,

select * from lineitem l where exists (select * from part p where p.p_partkey=l.l_partkey and p.p_name = 'a')

If the subquery returns at most one record for each l.l_partkey (i.e., there is a uniqueness constraint on the column with the equality condition (p_partkey, p_name)), then this subquery can be rewritten as follows:

select l.* from lineitem as l, part as p where p.p_partkey = l.l_partkey and p.p_name = 'a'

Conditions

  • EXISTS subquery conditions are associated with AND and other conditions
  • EXISTS subquery has no grouping and no LIMIT
  • EXISTS subquery result set returns UNIQUE rows
  • EXISTS subquery and outer query are associated by equality

7. Predicate Pushdown

Rule Description

Predicate Pushdown (FPPD) pushes filtering conditions as far "down" as possible into the internal query blocks of SQL, filtering out some data in advance, reducing the size of the intermediate result set, thereby reducing the amount of data that subsequent calculations need to process, and thus improving SQL execution performance. FPPD is a rewriting optimization.

For example, in the following case, there is a condition nation = 100 in the outer query, which can be pushed down into the person DT subquery:

select *
from (select c_nationkey nation, 'C' as type, count(1) num
from customer
group by nation
union
select s_nationkey nation, 'S', count(1) num
from supplier
group by nation) as person
where nation = 100

The rewritten SQL is as follows:

select *
from (select c_nationkey nation, 'C' as type, count(1) num
from customer
where c_nationkey = 100
group by nation
union
select s_nationkey nation, 'S', count(1) num
from supplier
where s_nationkey = 100
group by nation) as person

Conditions

  1. The filtering condition is an AND filtering condition (not a join condition)

  2. The filtering condition's field comes from the FROM subquery (if it is a view, it should be replaced by the SQL defined by the view)

  3. The subquery has not been optimized and eliminated by query folding

  4. The subquery itself does not have a LIMIT clause

  5. The subquery itself does not have window functions such as rownum or rank

8. Operations on Index Columns Disable Index

Rule Description

Operations on index columns can cause the index to become ineffective, which can easily lead to full table scans and severe performance issues. Therefore, it is necessary to try to convert operations on index columns to the constant side, such as the following SQL.

select * from tpch.orders where adddate(o_orderdate,  INTERVAL 31 DAY) =date '2019-10-10'  

The adddate function will make the index on o_orderdate unusable, which can be converted into the following equivalent SQL to use the index to improve query efficiency.

select * from tpch.orders where o_orderdate = subdate(date '2019-10-10' , INTERVAL 31 DAY);

PawSQL can help convert a large number of functions and operations related to the +、-、*、/ operators. Click to get more detailed information about this optimization.

Conditions

  1. The filtering condition is an AND filtering condition (not a join condition)

  2. The filtering condition is an indexable condition

  3. There are calculations or functions on the index column

9. GROUP BY Fields from Different Tables

Rule Description

If the grouping fields come from different tables, the database optimizer will not be able to utilize the order of the index to avoid a sort. If there are equality conditions in the WHERE or HAVING clauses, PawSQL can replace the sorting or grouping fields to make them come from the same table, thereby being able to use the index to avoid a sort. For example, in the following query

select o_custkey, c_name, sum(o.O_TOTALPRICE) from customer c, orders o where o_custkey = c_custkey group by o_custkey, c_name

The grouping fields o_custkey, c_name come from two tables, and there is a filtering condition o_custkey = c_custkey, which can be rewritten as

select c_custkey, c_name, sum(o.O_TOTALPRICE) from customer c, orders o where o_custkey = c_custkey  group by c_custkey, c_name

Conditions

  1. GROUP BY fields come from different tables

  2. The filtering condition is an indexable condition

  3. There are no calculations or functions on the index column

10. Pushing HAVING Conditions Down to WHERE

Logically, HAVING conditions are executed after grouping, while conditions on the WHERE clause can be executed during table access (index access) or after table access but before grouping. Both of these conditions are less costly than executing after grouping.

Consider the following example,

select c_custkey, count(*) from customer group by c_custkey having c_custkey < 100

The rewritten SQL is,

select c_custkey, count(*) from customer where c_custkey < 100 group by c_custkey

Conditions

  1. There are no aggregate functions in the HAVING clause

11. IN Subquery Rewrite Optimization

An IN subquery refers to a subquery of the following form. IN subqueries can be rewritten as equivalent related EXISTS subqueries or inner joins, thereby generating a new filtering condition. If there is a suitable index on this filtering condition, or through PawSQL's index recommendation engine recommends a suitable index, better performance can be achieved.

(expr1, expr2...) [NOT] IN (SELECT expr3, expr4, ...)
  • Rewriting IN subquery as EXISTS

For example, the following IN subquery language is to obtain customer information with orders in the last year,

select * from customer where c_custkey in (select o_custkey from orders where O_ORDERDATE>=current_date - interval 1 year)

It can be rewritten as an EXISTS subquery, thereby generating a filtering condition (c_custkey = o_custkey):

select * from customer where exists (select * from orders where c_custkey = o_custkey and O_ORDERDATE>=current_date - interval 1 year)
  • Rewriting IN subquery as inner join

If the result set of the subquery is not repeated, then the IN subquery can be rewritten as an association between two tables, thereby allowing the database optimizer to plan a better table join order, and also allowing PawSQL to recommend better optimization methods.

For example, in the following SQL, c_custkey is the primary key of the customer table,

select * from orders where o_custkey in (select c_custkey from customer where c_phone like '139%')

Then the above query can be rewritten as

select orders.* from orders, customer where o_custkey=c_custkey and c_phone like '139%'

Click to get more detailed information about this optimization.

Conditions

  1. The IN subquery is not a reverse condition
  2. If the result set of the subquery is not repeated, it can be rewritten as an inner join

12. Join Elimination

Rule Description

Join Elimination simplifies SQL by removing tables from the query without affecting the final result, thereby improving query performance. This optimization is usually applicable when the query contains a primary key-foreign key connection and only references the primary key column of the master table. Inner joins and outer joins can both be used for this rewrite optimization.

  • Case of inner join elimination
select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey

The orders table (orders) and the customer table (customer) are associated, and c_custkey is the primary key of the customer table, so the customer table can be eliminated, and the rewritten SQL is as follows:

select * from orders where o_custkey
  • Case of outer join elimination
select o_custkey from orders left join customer on c_custkey=o_custkey

The customer table can be eliminated, and the rewritten SQL is as follows:

select orders.o_custkey from orders

Conditions

  • The query contains a primary key-foreign key connection
  • The query only references the primary key column of the master table

13. Pushing LIMIT Down to UNION Branches

Rule Description

The LIMIT clause pushdown optimization pushes the LIMIT clause as far down as possible to filter out some data in advance, reduce the size of the intermediate result set, reduce the amount of data that subsequent calculations need to process, and improve query performance.

For example, in the following case, there is a LIMIT clause in the outer query, which can be pushed down to the inner query for execution:

select *
from (select c_nationkey nation, 'C' as type, count(1) num
from customer
group by c_nationkey
union
select s_nationkey nation, 'S' as type, count(1) num
from supplier
group by nation) as nation_s
order by nation limit 20, 10

The rewritten SQL is as follows:

select *
from (
(select customer.c_nationkey as nation, 'C' as `type`, count(1) as num
from customer
group by customer.c_nationkey
order by customer.c_nationkey limit 30)
union
(select supplier.s_nationkey as nation, 'S' as `type`, count(1) as num
from supplier
group by supplier.s_nationkey
order by supplier.s_nationkey limit 30)) as nation_s
order by nation_s.nation limit 20, 10

Conditions

  1. The outer query has a LIMIT clause
  2. The outer query does not have a GROUP BY clause
  3. The outer query's FROM only references one table, and it is a subquery
  4. The outer query has no other conditions
  5. The subquery is a single query or multiple subqueries connected by UNION/UNION ALL (or an outer connection of an outer table)
  6. The value of OFFSET is less than a specified threshold

21. Rewrite of MAX/MIN Subqueries

Rule Description

For subqueries using MAX/MIN,

select * from customer where c_custkey = (select max(o_custkey) from orders)

It can be rewritten in the following form to take advantage of the order of the index to avoid an aggregation operation,

select * from customer where c_custkey = (select o_custkey from orders order by o_custkey desc null last limit 1)

Get more detailed information about this optimization.

Conditions

  1. There is a scalar subquery with MAX/MIN in the SQL

14. ORDER Clause Reordering

Rule Description

If a query includes both sorting fields and grouping fields from the same table but in a different order, the grouping field order can be adjusted to be consistent with the sorting field order, so that the database can avoid a sort operation.

Consider the following two SQLs, the only difference between them is the order of the grouping fields (the first SQL is c_custkey, c_name, the second SQL is c_name,c_custkey), since there are no advanced grouping operations such as grouping set/cube/roll up in the grouping fields, so the two SQLs are equivalent. However, their execution plans and execution efficiencies are different, so the first SQL can be rewritten as the second SQL.

select o_custkey, o_orderdate, sum(O_TOTALPRICE)
from orders
group by o_custkey,o_orderdate
order by o_orderdate;

Rewritten as:

select o_custkey, o_orderdate, sum(o_totalprice)
from orders
group by o_orderdate,o_custkey
order by o_orderdate;

Conditions

  • There is an order clause with more than 1 member and a group clause in a QueryBlock
  • The clauses refer to columns in the same data table without functions or calculations
  • The columns in the order clause are a true subset of the group clause
  • The order clause is not a prefix of the group clause

15. Rewrite of SELECT with OR Conditions

Rule Description

If a query statement uses OR conditions, the database optimizer may not be able to use an index to complete the query. For example,

select * from lineitem where l_shipdate = date '2010-12-01' or l_partkey<100

If both of these fields have indexes, the query statement can be rewritten as a UNION or UNION ALL query to take advantage of the index to improve query performance.

select * from lineitem where l_shipdate = date '2010-12-01' 
union select * from lineitem where l_partkey<100

If the database supports INDEX MERGING, the database-related parameters can also be adjusted to enable the INDEX MERGING optimization strategy to improve database performance. Get more detailed information about this optimization.

Conditions

  1. The OR connected conditions must be able to use an index
  2. The estimated cost of the rewritten UNION statement is less than the original SQL
  3. If the conditions of the OR branch are exclusive, then rewrite as UNION ALL

16. Rewrite of UPDATE or DELETE with OR Conditions

Rule Description

If there are UPDATE or DELETE statements using OR conditions, the database optimizer may not be able to use an index to complete the operation.

delete from lineitem where l_shipdate = date '2010-12-01' or l_partkey<100

If both of these fields have indexes, it can be rewritten as multiple DELETE statements to take advantage of the index to improve query performance.

delete from lineitem where l_shipdate = date '2010-12-01';
delete from lineitem where l_partkey<100;

Conditions

  • The SQL is an UPDATE or DELETE statement
  • The UPDATE or DELETE statement has an OR condition
  • Each branch of the OR condition can be indexed

17. Elimination of Sorting in Subqueries without LIMIT

Rule Description

If a subquery does not have a LIMIT clause, then the sorting operation of the subquery is meaningless and can be deleted without affecting the final result. Some cases are as follows:

  • EXISTS subquery
select * from lineitem l where exists (select * from part p where p.p_partkey=l.l_partkey and p.p_name = 'a' order by p_name )

Can be rewritten as

select * from lineitem l where exists (select * from part p where p.p_partkey=l.l_partkey and p.p_name = 'a')

Conditions

  • The subquery has an ORDER clause
  • The subquery does not have a LIMIT clause

18. ORDER BY Fields from Different Tables

Rule Description

If the sorting fields come from different tables, the database optimizer will not be able to utilize the order of the index to avoid a sort. If there are equality conditions in the WHERE or HAVING clauses, PawSQL can replace the sorting or grouping fields to make them come from the same table, thereby being able to use the index to avoid a sort. For example, in the following query

select * from customer c, orders o where o_custkey = c_custkey order by o_custkey, c_name

The sorting fields o_custkey, c_name come from two tables, and there is a filtering condition o_custkey = c_custkey, which can be rewritten as

select * from customer c, orders
o where o_custkey = c_custkey order by c_custkey, c_name

Conditions

  1. ORDER fields come from different tables

  2. The filtering condition is an indexable condition

  3. There are no calculations or functions on the index column

19. Outer Join Optimization

Rule Description

Outer join optimization refers to the transformation of outer joins into inner joins under certain conditions (the outer table has a NULL rejection condition), thereby allowing the database optimizer to choose a better execution plan and improve the performance of SQL queries.

Consider the following example,

select c_custkey from orders left join customer on c_custkey=o_custkey where C_NATIONKEY  < 20

C_NATIONKEY < 20 is a NULL rejection condition on the customer table, so the above left outer join can be rewritten as an inner join,

select c_custkey from orders inner join customer on c_custkey=o_custkey where C_NATIONKEY  < 20

Get more detailed information about this optimization.

Conditions

For SQL,

SELECT * T1 FROM T1 LEFT JOIN T2 ON P1(T1,T2) WHERE P(T1,T2) AND R(T2) 

If R(T2) is a null rejection condition (NFC), then the above outer join can be transformed into an inner join, that is

SELECT * T1 FROM T1 JOIN T2 ON P1(T1,T2) WHERE P(T1,T2) AND R(T2) 

In this way, the optimizer can first apply R(T2), obtain a very small result set, and then associate it with T1.

20. Projection Pushdown

Rule Description

Projection pushdown refers to the process of removing meaningless columns in the DT subquery (not used in the outer query) to reduce IO and network costs, while increasing the chance of the optimizer adopting optimization options that do not require a return table when planning table access.

Consider the following example,

SELECT count(1) FROM (SELECT c_custkey, avg(age) FROM customer group by c_custkey) AS derived_t1;

The rewritten SQL is,

SELECT count(1) FROM (SELECT 1 FROM customer group by c_custkey) AS derived_t1;

Get more detailed information about this optimization.

Conditions

  • There are columns in the inner select list that are not used by the outer query block

21. Rewrite of Qualified Subqueries

Rule Description

Subqueries Qualified by ANY/SOME/ALL originate from the SQL-92 standard and are commonly used to check the size relationship between a value and all or any values returned by the subquery. Subqueries Qualified by ANY/SOME/ALL have low execution efficiency because they need to compare with the result set of the subquery row by row, and decrease linearly with the increase of the result set size. Their execution efficiency can be improved by rewriting the query.

For example, for the following SQL:

select * from orders where o_orderdate < all(select o_orderdate from orders where o_custkey > 100)

For MySQL, it can be rewritten as

select * from orders where o_orderdate < (select o_orderdate from orders where o_custkey > 100 order by o_orderdate asc limit 1) 

For PostgreSQL or Oracle, it can be rewritten as

select * from orders where o_orderdate < (select o_orderdate from orders where o_custkey > 100 order by o_orderdate asc nulls first limit 1)

Conditions

  • There are subqueries Qualified by ANY/SOME/ALL in the SQL

22. Query Folding

Rule Description

Query folding refers to the process of expanding views, CTEs, or DT subqueries and merging them with the query statements that reference them to reduce serialized intermediate result sets or trigger better optimization techniques related to table join planning.

Consider the following example,

SELECT * FROM (SELECT c_custkey, c_name FROM customer) AS derived_t1;

The rewritten SQL is,

SELECT c_custkey, c_name FROM customer

Get more detailed information about this optimization.

Conditions

PawSQL's optimization engine supports two types of query folding optimization strategies for different SQL syntax structures. The first type of query folding optimization is supported internally by the optimizer in MySQL 5.7 and PostgreSQL 14.0 and above; the second type of query folding optimization is not supported in the latest versions of MySQL and PostgreSQL.

  • Query folding type I

    • There is no distinct keyword in the view itself

    • There are no aggregate functions or window functions in the view

    • There is no LIMIT clause in the view itself

    • There is no UNION or UNION ALL in the view itself

    • In the external query block, the folded view is not part of an outer join.

  • Query folding type II

    • In the external query block, the view is the only table reference

    • There are no groupings, aggregate functions, and window functions in the external query block

    • No window functions are used inside the view

23. SATTC Rewrite

Rule Description

SAT-TC (SATisfiability-Transitive Closure) rewriting optimization refers to the analysis of a set of related query conditions to discover whether there are contradictory conditions, simplifications, or the inference of new conditions, thereby helping the database optimizer choose a better execution plan and improve SQL performance.

Consider the following example,

select c.c_name FROM customer c where c.c_name = 'John' and c.c_name = 'Jessey'

Since the conditions are contradictory, the rewritten SQL is,

select c.c_name from customer as c where  1 = 0 

Get more detailed information about this optimization.

Conditions

  • There are contradictions between predicates (e.g., c_custkey=1 AND c_custkey=0), or

  • New predicates can be inferred from the predicate set (e.g., c_custkey=1 AND c_custkey=o_custkey implies o_custkey=1).

  • Predicates can be simplified (e.g., c_custkey != c_custkey or c_name = 'b' can be simplified to c_name = 'b')

24. De-Correlation of Select List Scalar Subqueries

Rule Description

Scalar subqueries return a single value of a single row and column, and they can appear anywhere in SQL where a single value is required. For related scalar subqueries, PawSQL attempts to de-correlate them to improve their performance. At the same time, if multiple subqueries with similar main structures appear in the select list, PawSQL will merge them when de-correlating to reduce computation.

For example, the following scalar subquery

select c_custkey, 
(select SUM(o_totalprice)
from ORDERS
where o_custkey = c_custkey and o_orderdate = '2020-04-16') as total,
(select count(*)
from ORDERS
where o_custkey = c_custkey and o_orderdate = '2020-04-16') as cnt
from CUSTOMER

Will be rewritten as:

select /*QB_1*/ c_custkey, SUM_ as total, count_ as cnt
from CUSTOMER left outer join (
select o_custkey, SUM(o_totalprice) as SUM_,count(*) as count_
from ORDERS
where o_orderdate = '2020-04-16'
group by o_custkey) as SQ on o_custkey = c_custkey

25. De-Correlation of Conditional Scalar Subqueries

Rule Description

Scalar subqueries return a single value of a single row and column, and they can appear anywhere in SQL where a single value is required. For scalar subqueries that appear in the WHERE and HAVING clauses, PawSQL attempts to de-correlate them to improve their performance.

For example, the following scalar subquery

select c_custkey
from CUSTOMER
where 1000000 < (select SUM(o_totalprice)
from ORDERS

where o_custkey = c_custkey and o_orderdate = '2020-04-16')

Will be rewritten as:

select /*QB_2*/ c_custkey
from CUSTOMER, (
select /*QB_1*/ SUM( o_totalprice) as SUM_, o_custkey
from ORDERS
where o_orderdate = cast('2020-04-16' as DATE)
group by o_custkey) as SQ
where 1000000 < SUM_ and o_custkey = CUSTOMER.c_custkey

26. View Resolving

Rule Description

View expansion refers to the process of replacing the views referenced in the query with their definitions. This optimization technique allows the query optimizer (including the PawSQL optimization engine and the database optimizer) to have more optimization opportunities, which may result in a more efficient execution plan.

For example

-- Create a view
CREATE VIEW high_salary_employees AS
SELECT employee_id, name, department_id
FROM employees
WHERE salary > 50000;

-- Query using the view
SELECT h.name, d.department_name
FROM high_salary_employees h
JOIN departments d ON h.department_id = d.department_id
WHERE h.employee_id < 1000;

Will be rewritten as:

SELECT h.name, d.department_name
FROM high_salary_employees h
JOIN (SELECT employee_id, name, department_id
FROM employees
WHERE salary > 50000) d ON h.department_id = d.department_id
WHERE h.employee_id < 1000;