PawSQL Optimization Case: TPC-H Query 9 Performance Improved by 1195.14%
Introduction
After automatically optimizing TPC-H benchmark Query 9, PawSQL improved its performance by 1195.14%. This article will explore PawSQL's effective strategies and actual results for SQL performance optimization based on its optimization process. By comparing the query structure, index design, and execution plans before and after optimization, we will explain how to achieve significant performance improvements.
This case can be viewed online at: https://pawsql.com/statement/1837384704875499522
TPC-H Query 9 Introduction: The 9th query of TPC-H calculates the total profit of all ordered parts by country and year. Its query characteristics include:
- Grouping, sorting, and aggregation operations
- Multiple table joins and subqueries
- Uses a full-match LIKE operator
1️⃣ Query Rewriting Optimization
Original Query:
SELECT profit.nation, profit.o_year, SUM(profit.amount) AS sum_profit
FROM (
SELECT nation.n_name AS nation, EXTRACT(YEAR FROM orders.o_orderdate) AS o_year,
lineitem.l_extendedprice * (1 - lineitem.l_discount) - partsupp.ps_supplycost * lineitem.l_quantity AS amount
FROM part, supplier, lineitem, partsupp, orders, nation
WHERE supplier.s_suppkey = lineitem.l_suppkey
AND partsupp.ps_suppkey = lineitem.l_suppkey
AND partsupp.ps_partkey = lineitem.l_partkey
AND part.p_partkey = lineitem.l_partkey
AND orders.o_orderkey = lineitem.l_orderkey
AND supplier.s_nationkey = nation.n_nationkey
AND part.p_name LIKE '%dim%'
) AS profit
GROUP BY profit.nation, profit.o_year
ORDER BY profit.nation, profit.o_year DESC;
Optimized Query:
SELECT profit_nation.n_name AS nation, EXTRACT(YEAR FROM profit_orders.o_orderdate) AS o_year,
SUM(profit_lineitem.l_extendedprice * (1 - profit_lineitem.l_discount) - profit_partsupp.ps_supplycost * profit_lineitem.l_quantity) AS sum_profit
FROM part AS profit_part, supplier AS profit_supplier,
lineitem AS profit_lineitem, partsupp AS profit_partsupp,
orders AS profit_orders, nation AS profit_nation
WHERE profit_supplier.s_suppkey = profit_lineitem.l_suppkey
AND profit_partsupp.ps_suppkey = profit_lineitem.l_suppkey
AND profit_partsupp.ps_partkey = profit_lineitem.l_partkey
AND profit_part.p_partkey = profit_lineitem.l_partkey
AND profit_orders.o_orderkey = profit_lineitem.l_orderkey
AND profit_supplier.s_nationkey = profit_nation.n_nationkey
AND profit_part.p_name LIKE '%dim%'
GROUP BY profit_nation.n_name, o_year
ORDER BY profit_nation.n_name, o_year DESC;
Query Rewriting Key Points:
- Eliminate subquery: Lift the subquery logic to the main query, simplifying the structure.
- Table alias renaming: Improve query readability and maintainability.
- Preserve calculation logic: Ensure that the business logic of the query remains unchanged.
2️⃣ 🔍 Index Optimization Strategy
PawSQL's proposed index optimization scheme:
CREATE INDEX PAWSQL_IDX0485218972 ON tpch.lineitem(L_PARTKEY, L_SUPPKEY, L_SHIPDATE);
CREATE INDEX PAWSQL_IDX0214365528 ON tpch.supplier(S_NATIONKEY, S_SUPPKEY, S_NAME, S_ADDRESS);
CREATE INDEX PAWSQL_IDX0327029402 ON tpch.part(P_NAME);
Index Optimization Analysis:
- lineitem table index: Targets join conditions and date filtering fields to reduce I/O operations.
- supplier table index: Covers join and query fields, avoiding table lookups and improving retrieval efficiency.
- part table index: Optimizes fuzzy matching of p_name using the LIKE operator.
3️⃣ Execution Plan Comparison
Execution Plan Before Optimization:
- Nested Loop Joins: Extensively used Nested loop joins, with low efficiency.
- Full Table Scan: Performed full table scan on lineitem table, involving 60,175 rows.
- Reliance on Primary Key Index: Primarily used primary key index for table joins.
Execution Plan After Optimization:
- Join Order Optimization: Still uses nested loop, but with optimized join order.
- Index Scan: Utilizes the newly created PAWSQL_IDX0327029402 index, reducing part table scan from 60,175 to 114 rows.
- Precise lineitem Table Lookup: Avoids full table scan, uses PAWSQL_IDX0485218972 for precise matching, further reducing processed rows.
Ultimately, the total number of processed rows decreased from 60,175 to 3,513.
4️⃣ Performance Improvement
- Execution Time: Reduced from 260 milliseconds to 20 milliseconds.
- Performance Improvement: Achieved a 1195.14% improvement.
- Main Factors: Index optimization significantly improved data access methods, reducing the number of processed rows.
🌐 About PawSQL
PawSQL is dedicated to automatic and intelligent database performance optimization. The products provided by PawSQL include:
- PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
- PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
Contact Us
Email: service@pawsql.com
Website: https://www.pawsql.com