Last year, I worked on a
project where I had to verify that ported SQL statements from Oracle are
compatible with PostgreSQL. There were a lot of "mysterious"
performance problems in the ported queries. And Most of those problems
turned out to be a badly chosen query execution plan. After
working on this project, I realized that if we do not understand how
PostgreSQL planner works we can be in a sticky situation after migrating
data from Oracle.
In
this note, I will provide a query which produces a totally different
execution plan in PostgreSQL and supply a solution to make the query as
fast as in Oracle.
I have run up some test scripts to examine how PostgreSQL handles a correlated subquery.
CREATE TABLE employee (
empno numeric(5,0) NOT NULL,
ename character varying(10),
job character varying(9),
mgr numeric(5,0),
hiredate timestamp(0),
sal numeric(7,2) NOT NULL,
comm numeric(7,2) NOT NULL,
deptno numeric(2,0),
sido_nm character varying(100)
);
insert into employee
select i, chr(65+mod(i,26))||i::text||'NM'
,case when mod(i,10000)=0 then 'PRESIDENT'
when mod(i,1000) = 0 then 'MANAGER'
when mod(i,3)=0 then 'SALESMAN'
when mod(i,3)=1 then 'ANALYST'
when mod(i,3)=2 then 'CLERK'
end as job
,case when mod(i,10000)= 0 then null
when mod(i,1000)= 1 then 10000
when i >= 9000 then 1000
else ceiling((i+1000)/1000)*1000
end as mgr
, current_date - i
, trunc(random() * 10000) as sal
, trunc(random() * 10000) as com
, mod(i,12)+1 as deptno
, case when mod(i,3) = 0 then 'Jeonbuk'
when mod(i,3) = 1 then 'Kangwon'
else 'Chungnam'
end as sido_nm
from generate_series(1,10000) a(i);
ALTER TABLE employee ADD CONSTRAINT employee_pk
PRIMARY KEY (empno);
The
following query reports all employees who earned more than the average
for their department, where earnings is calculated as the sum of salary
and commission.
SELECT EMPNO, SAL, COMM, DEPTNO
FROM EMPLOYEE A
WHERE (A.SAL+A.COMM) >
(SELECT AVG(B.SAL+COMM)
FROM EMPLOYEE B
WHERE A.DEPTNO = B.DEPTNO
)
ORDER BY A.DEPTNO, A.EMPNO;
In
Oracle I could get the result in 3 or 4 seconds. But in PostgreSQL with
the same amount of data I could get the result in about 13 seconds. In
order to solve this problem you should check the execution plan. The
following is the execution plan in PostgreSQL.
Sort (actual time=13661.291..13661.432 rows=4979 loops=1)
Sort Key: a.deptno, a.empno
Sort Method: quicksort Memory: 893kB
Buffers: shared hit=1100113
-> Seq Scan on employee a (actual time=2.617..13649.060 rows=4979 loops=1)
Filter: ((sal + comm) > (SubPlan 1))
Rows Removed by Filter: 5021
Buffers: shared hit=1100110
SubPlan 1
-> Aggregate (actual time=1.363..1.363 rows=1 loops=10000)
Buffers: shared hit=1100000
-> Seq Scan on employee b (actual time=0.002..1.220 rows=833 loops=10000)
Filter: (a.deptno = deptno)
Rows Removed by Filter: 9167
Buffers: shared hit=1100000
Planning Time: 0.079 ms
Execution Time: 13661.574 ms
We
can observe that PostgreSQL did not collapse the subquery. So it had to
perform a sequential scan on the table EMPLOYEE 10,000 times which led
to poor performance. It took just 1.363 ms to run the correlated
subquery, but repetitive execution of the subquery took 13630 ms
(13630=1.363 * 10,000). I do not know any technique to force PostgreSQL to collapse the subquery.
If we run the same query in Oracle, the logical optimizer of Oracle transforms the SQL statement like this:
--I got this query after running a 10053 trace event and examining the contents of the trace file.
SELECT EMPNO, SAL, COMM, DEPTNO FROM ( SELECT A.EMPNO, A.SAL, A.COMM, A.DEPTNO, AVG(B.COMM +B.SAL) AS AVG_VAL FROM EMPLOYEE A, EMPLOYEE B WHERE A.DEPTNO = B.DEPTNO GROUP BY A.DEPTNO, A.EMPNO, A.SAL, A.COMM ) X WHERE SAL + COMM > AVG_VALORDER BY DEPTNO, EMPNO;
Firstly
the logical optimizer collapses the corellated subquery, so the
subquery becomes an inline view. Secondly the resulting inline view
merges with the main query. What a beautiful optimizer! It is like an
artificial intelligence is working in Oracle.
If we run the transformed query in PostgreSQL, we get the following execution plan.
Sort (actual time=3315.641..3316.060 rows=4979 loops=1)
Sort Key: x.deptno, x.empno
Sort Method: quicksort Memory: 581kB
Buffers: shared hit=220
-> Subquery Scan on x (actual time=3304.109..3310.219 rows=4979 loops=1)
Buffers: shared hit=220
-> HashAggregate (actual time=3304.108..3309.858 rows=4979 loops=1)
Group Key: a.empno
Filter: ((a.sal + a.comm) > avg((b.comm + b.sal)))
Batches: 1 Memory Usage: 6545kB
Rows Removed by Filter: 5021
Buffers: shared hit=220
-> Hash Join (actual time=2.237..1148.766 rows=8333336 loops=1)
Hash Cond: (a.deptno = b.deptno)
Buffers: shared hit=220
-> Seq Scan on employee a (actual time=0.008..2.944 rows=10000 loops=1)
Buffers: shared hit=110
-> Hash (actual time=2.199..2.200 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 587kB
Buffers: shared hit=110
-> Seq Scan on employee b (actual time=0.003..1.137 rows=10000 loops=1)
Buffers: shared hit=110
Planning:
Buffers: shared hit=2
Planning Time: 0.187 ms
Execution Time: 3316.562 ms
We can notice that the number of block IOs dropped from 1100113 to 220. So under PostgreSQL 14 we have to be able to transform the query on our own.
If your query is relatively slow in PostgreSQL after migration, you
should check the execution plan and pinpoint the problematic operation
in the plan. In some cases you should tweak the query to remove the
problematic operation in the executioin plan.
For your reference PostgreSQL does not provide SORT GROUP BY operation. It has to perform HashAggregate and then perform Sort.
Let the query visit a table only once
If
we take a closer look at the transformed SQL statement, we can notice
that the query is touching the table EMPLOYEE twice to get the resulting
dataset. You had better touch a table once to imrove performance. So as a quick "what if" test I tried using an analytic functioin as follows:
SELECT *
FROM (
SELECT EMPNO, SAL, COMM, DEPTNO, AVG(SAL+COMM) OVER (PARTITION BY DEPTNO) AS AVG_VAL
FROM EMPLOYEE
) A
WHERE (SAL+ COMM) > AVG_VAL
ORDER BY A.DEPTNO, A.EMPNO;
Sort (actual time=25.412..25.581 rows=4979 loops=1)
Sort Key: a.deptno, a.empno
Sort Method: quicksort Memory: 581kB
Buffers: shared hit=110
-> Subquery Scan on a (actual time=10.749..20.021 rows=4979 loops=1)
Filter: ((a.sal + a.comm) > a.avg_val)
Rows Removed by Filter: 5021
Buffers: shared hit=110
-> WindowAgg (actual time=10.747..17.564 rows=10000 loops=1)
Buffers: shared hit=110
-> Sort (actual time=10.481..11.202 rows=10000 loops=1)
Sort Key: employee.deptno
Sort Method: quicksort Memory: 1166kB
Buffers: shared hit=110
-> Seq Scan on employee (actual time=0.022..4.341 rows=10000 loops=1)
Buffers: shared hit=110
Planning Time: 0.387 ms
Execution Time: 25.792 ms
Note that the number of block IOs has fallen from 220 to 110 and we have gotten up to 100 times performance increase (3316 -> 25 ms) in terms of execution time. I woud not argue that
analytic functions do always help performance, but in this particular
case it worked.
Conclusion
It
is easy to overlook the fact that the execution plan of a query can
matter for query performance. When you move your database from Oracle to
PostgreSQL, you should check queries involving correlated subquries.
One of the more prominent limitations of the PostgreSQL optimizer is the
fact that PostgreSQL is not so good at collapsing a correlated subquery,
which may have a significant impact on performance and it may not be
possible to bypass the problem unless you amend the query to collapse
the subquery.