跳到主要内容

COUNT标量子查询优化

Copyright © 2023 PawSQL

定义

在日常开发中,有部分开发人员使用关联标量子查询来进行是否存在的判定,譬如下面的SQL查询有订单的用户列表,

select * from customer where (select count(*) from orders where c_custkey=o_custkey) > 0

这类查询有比较严重的性能问题,它需要对外表的每一条记录,进行一次聚集运算。从上面SQL的执行计划可以看到,它的执行时间为4820.015 ms.

Seq Scan on customer  (cost=0.00..3090818.00 rows=3333 width=181) (actual time=3.944..4819.230 rows=1251 loops=1)
Filter: ((SubPlan 1) > 0)
Rows Removed by Filter: 8749
SubPlan 1
-> Aggregate (cost=309.03..309.04 rows=1 width=8) (actual time=0.478..0.478 rows=1 loops=10000)
-> Seq Scan on orders (cost=0.00..309.01 rows=8 width=0) (actual time=0.421..0.474 rows=1 loops=10000)
Filter: (customer.c_custkey = o_custkey)
Rows Removed by Filter: 10000
Planning Time: 9.246 ms
Execution Time: 4820.015 ms

更好的写法是使用EXISTS子查询,或是IN子查询,如下所示:

  • EXISTS子查询
select * from customer where exists(select 1 from orders where c_custkey=o_custkey)
Nested Loop  (cost=309.31..533.68 rows=1251 width=181) (actual time=2.126..10.256 rows=1251 loops=1)
-> HashAggregate (cost=309.01..321.52 rows=1251 width=4) (actual time=1.769..1.976 rows=1251 loops=1)
Group Key: orders.o_custkey
Batches: 1 Memory Usage: 129kB
-> Seq Scan on orders (cost=0.00..284.01 rows=10001 width=4) (actual time=0.010..0.502 rows=10001 loops=1)
-> Memoize (cost=0.30..1.27 rows=1 width=181) (actual time=0.006..0.006 rows=1 loops=1251)
Cache Key: orders.o_custkey
Cache Mode: logical
Hits: 0 Misses: 1251 Evictions: 0 Overflows: 0 Memory Usage: 348kB
-> Index Scan using customer_pkey on customer (cost=0.29..1.26 rows=1 width=181) (actual time=0.006..0.006 rows=1 loops=1251)
Index Cond: (c_custkey = orders.o_custkey)
Planning Time: 3.119 ms
Execution Time: 10.454 ms
  • IN子查询
select * from customer where c_custkey in (select o_custkey from orders)
Nested Loop  (cost=309.31..533.68 rows=1251 width=181) (actual time=1.588..3.615 rows=1251 loops=1)
-> HashAggregate (cost=309.01..321.52 rows=1251 width=4) (actual time=1.577..1.676 rows=1251 loops=1)
Group Key: orders.o_custkey
Batches: 1 Memory Usage: 129kB
-> Seq Scan on orders (cost=0.00..284.01 rows=10001 width=4) (actual time=0.007..0.469 rows=10001 loops=1)
-> Memoize (cost=0.30..1.27 rows=1 width=181) (actual time=0.001..0.001 rows=1 loops=1251)
Cache Key: orders.o_custkey
Cache Mode: logical
Hits: 0 Misses: 1251 Evictions: 0 Overflows: 0 Memory Usage: 348kB
-> Index Scan using customer_pkey on customer (cost=0.29..1.26 rows=1 width=181) (actual time=0.001..0.001 rows=1 loops=1251)
Index Cond: (c_custkey = orders.o_custkey)
Planning Time: 0.216 ms
Execution Time: 4.152 ms