While migrating a database from Oracle to PostgreSQL, I happened to witness a significant degradation in performance. And I couldn't make the query go as fast as in Oracle, which was due to the abscence of the "Partial Join" feature in PostgreSQL.
This is just a quick article to clarify the optimizer difference between Oracle and PostgreSQL. As usual here is a simplified version of the problem - it involves a query retrieving distinct combinations of columns from a table after performing a join with another table.create table customer (
cust_no numeric not null,
cust_nm character varying(100),
register_date timestamp(0),
register_dt varchar(8),
cust_status_cd varchar(1),
register_channel_cd varchar(1),
cust_age smallint,
active_yn varchar(1),
sigungu_cd varchar(5),
sido_cd varchar(2)
);
insert into customer
select i, chr(65+mod(i,26))||i::text||'CUST_NM'
, current_date - mod(i,10000)
, to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt
, mod(i,5)+1 as cust_status_cd
, mod(i,2)+1 as register_channel_cd
, trunc(random() * 100) +1 as age
, case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn
, case when mod(i,1000) = 0 then '11007'
when mod(i,1000) = 1 then '11006'
when mod(i,1000) = 2 then '11005'
when mod(i,1000) = 3 then '11004'
when mod(i,1000) = 4 then '11003'
when mod(i,1000) = 5 then '11002'
else '11001' end as sigungu_cd
, case when mod(i,4) = 0 then '01'
when mod(i,4) = 1 then '02'
else '03' end as sido_cd
from generate_series(1,1000000) a(i);
ALTER TABLE customer ADD CONSTRAINT customer_pk
PRIMARY KEY (cust_no);
create table com_code (
group_cd varchar(10),
cd varchar(10),
cd_nm varchar(100));
insert into com_code values ('G1','11001','SEOUL')
,('G1','11002','PUSAN')
,('G1','11003','INCHEON')
,('G1','11004','DAEGU')
,('G1','11005','JAEJU')
,('G1','11006','ULEUNG')
,('G1','11007','ETC');
insert into com_code values ('G2','1','Infant')
,('G2','2','Child')
,('G2','3','Adolescent')
,('G2','4','Adult')
,('G2','5','Senior');
insert into com_code values ('G3','01','Jeonbuk')
,('G3','02','Kangwon')
,('G3','03','Chungnam');
Here is the query that runs fast in Oracle but not in PostgreSQL.
I have disabled parallel processing for the purpose of simpliciy and clarity.SET MAX_PARALLEL_WORKERS_PER_GATHER = 0;
SELECT DISTINCT A.CD, A.CD_NM
FROM COM_CODE A, CUSTOMER B
WHERE A.GROUP_CD ='G1'
AND A.CD = B.SIGUNGU_CD;
Here is the execution plan produced by PostgreSQL 15.1.HashAggregate (actual time=425.430..425.433 rows=7 loops=1)
Group Key: a.cd, a.cd_nm
Batches: 1 Memory Usage: 24kB
Buffers:
shared hit=10403 read=962 ->
Hash Join (actual time=0.020..270.593 rows=1000000 loops=1)
Hash Cond: ((b.sigungu_cd)::text = (a.cd)::text)
Buffers: shared hit=10403 read=962
-> Seq Scan on customer b (actual time=0.007..63.825 rows=1000000 loops=1)
Buffers: shared hit=10402 read=962
-> Hash (actual time=0.009..0.010 rows=7 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on com_code a (actual time=0.004..0.006 rows=7 loops=1)
Filter: ((group_cd)::text = 'G1'::text)
Rows Removed by Filter: 8
Buffers: shared hit=1
Planning Time: 0.094 ms
Execution Time: 425.459 ms
You will notice the table scan on COM_CODE and hash join with CUSTOMER as the probe table, resuling in an intermediate result set of 1,000,000 rows, which leads to 11365(10403+962) block I/Os. At first sight the plan looks efficient as it used the smaller CUSTOMER table as a build table.
Before proceeding further, I'd like to present the well-organized execution plan in Oracle 12c here:
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 |00:00:00.01 | 16 | | | |
| 1 | HASH UNIQUE | | 1 | 00:00:01 | 7 |00:00:00.01 | 16 | 886K| 886K| |
|* 2 | HASH JOIN SEMI | | 1 | 00:00:01 | 7 |00:00:00.01 | 16 | 1695K| 1695K| 1246K (0)|
|* 3 | TABLE ACCESS FULL| COM_CODE | 1 | 00:00:01 | 7 |00:00:00.01 | 7 | | | |
| 4 | TABLE ACCESS FULL| CUSTOMER | 1 | 00:00:01 | 907 |00:00:00.01 | 9 | | | |
-----------------------------------------------------------------------------------------------------------------------
As you can see, the shape of the plan is completely different. Oracle is incurring just 16 block I/Os. Note that Oracle did not read the entire CUSTOMER table. Despite of having 1,000,000 rows in the CUSTOMER table, Oracle did a full scan on it, but looked at only 907 rows (A-Rows). The HASH JOIN SEMI operation first read the COM_CODE table in order to build the hash table. Then, the optimizer probed the CUSTOMER table to find matching rows. But it stopped probing the CUSTOMER table as soon as it found the 7 CD values. This is the reason why the optimizer only read 907 rows here. And this is exactly what a HASH SEMI JOIN is: the optimizer doesn't require all the matching rows. This lovely Oracle feature is called "Partial Join" introduced in Oracle 12c. You will have to take my word for it that I am not manipulating the test results here.
For those who may have doubts about my test results in Oracle 12c, I have left the test code at the end of this article.
Getting back to the query in PostgreSQL, if you are a power user of SQL, you might come up with the idea of using the EXISTS clause to help the optimizer run the query efficiently. Consequently, I have rewritten the query as follows:SELECT A.CD, A.CD_NM
FROM COM_CODE a
WHERE A.GROUP_CD ='G1'
AND
EXISTS (SELECT 1
FROM CUSTOMER B
WHERE A.CD = B.SIGUNGU_CD
);
I have run the query and here is the resulting execution plan:Hash Join (cost=23864.16..23865.40 rows=3 width=11) (actual time=224.045..224.050 rows=7 loops=1)
Hash Cond: ((a.cd)::text = (b.sigungu_cd)::text)
Buffers: shared hit=10659 read=706
-> Seq Scan on com_code a (cost=0.00..1.19 rows=7 width=11) (actual time=0.008..0.010 rows=7 loops=1)
Filter: ((group_cd)::text = 'G1'::text)
Rows Removed by Filter: 8
Buffers: shared hit=1
-> Hash (cost=23864.07..23864.07 rows=7 width=6) (actual time=224.033..224.033 rows=7 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=10658 read=706
-> HashAggregate (cost=23864.00..23864.07 rows=7 width=6) (actual time=224.026..224.027 rows=7 loops=1)
Group Key: (b.sigungu_cd)::text
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=10658 read=706
-> Seq Scan on customer b (cost=0.00..21364.00 rows=1000000 width=6) (actual time=0.004..114.394 rows=1000000 loops=1)
Buffers: shared hit=10658 read=706
Planning Time: 0.099 ms
Execution Time: 224.075 msWhat a god damn Postgres! I was expecting PostgreSQL to produce an execution plan like this:
Hash Semi Join Hash Cond: ((b.sigungu_cd)::text = (a.cd)::text)
-> Seq Scan on customer b
-> Hash
-> Seq Scan on com_code a
With the hypothetical plan above, PostgreSQL wouldn't need to read the whole CUSTOMER table. However, the optimizer failed to produce the expected plan with the modified query. Instead, it produced a suboptimal plan which caused the engine to scan the entire CUSTOMER table and perform the hash aggregation. The full table scan on the CUSTOMER table and the hash aggregation are taking the lion's share of the long elpasded time. Although the elapsed time decreased from 425 ms to 216 ms, the block I/O count remained the same.
I tried everything to make PostgreSQL generate the desirable plan mentioned above, but I was unsuccessful. I believe this inability to generate a HASH SEMI JOIN node is a significant flaw in the design of the PostgreSQL planner.
While fiddling around with the query for a while, I managed to reduce the number of block I/Os using the following query:
(I utilized the pg_hint_plan extension.)/+ Set(enable_hashagg off) Set(enable_sort off) */
SELECT A.CD, A.CD_NM
FROM COM_CODE a
WHERE A.GROUP_CD ='G1'
AND EXISTS (SELECT 1
FROM CUSTOMER B
WHERE A.CD = B.SIGUNGU_CD
);
I've used the /+ Set(enable_hashagg off) Set(enable_sort off) */ hint to stop PostgreSQL from using hashed aggregation plans and explicit sort steps.
Here's the execution plan I got:Nested Loop Semi Join (cost=0.00..99493.98 rows=3 width=11) (actual time=0.018..0.326 rows=7 loops=1)
Join Filter: ((a.cd)::text = (b.sigungu_cd)::text)
Rows Removed by Join Filter: 1014
Buffers: shared hit=13
-> Seq Scan on com_code a (cost=0.00..1.19 rows=7 width=11) (actual time=0.009..0.010 rows=7 loops=1)
Filter: ((group_cd)::text = 'G1'::text)
Rows Removed by Filter: 8
Buffers: shared hit=1
-> Materialize (cost=0.00..30271.00 rows=1000000 width=6) (actual time=0.001..0.036 rows=146 loops=7)
Buffers: shared hit=12
-> Seq Scan on customer b (cost=0.00..21364.00 rows=1000000 width=6) (actual time=0.004..0.123 rows=1000 loops=1)
Buffers: shared hit=12
Planning Time: 0.137 ms
Execution Time: 0.343 msI've highlited Nested Loop Semi Join and shared hit=13: the Nested Loop Semi Join allows the optimizer to avoid reading the entire CUSTOMER table - we can observe that PostgreSQL read only 1000 rows, resuling in 12 block I/Os. This optimization is useful only when we have all the values at the beginning of the table.
When comparing the estimated cost values of the two execution plans, the cost of the Nested Loop Semi Join was 99493, while the cost of the Hash Join was 23865. Unfortunately the optimizer hasn't factored in the fact that it can avoid reading the entire CUSTOMER table in the cost estimate of the Nested Looop Semi Join. Seq Scan on customer b estimated 1,000,000 rows, whereas the actual number of rows scanned was 1000. The incorrect calculation of the cost led PostgreSQL to choose a less efficient execution plan without a hint.Closing Comment
I am afraid this note ends in frustration because I didn't find a solution(generating the Hash Semi Join operation) to the problem I wanted to fix - possibly because there just isn't a solution, possibly because I didn't put in sufficient effort in my search.
In PostgreSQL, it is better to write semi-join SQL statements with EXISTS() than to do a JOIN. Sometimes PostgreSQL fails to generate the Hash Semi Join operation. Furthermore, there can be inaccuracies in the calculation of the cost, resuling in unfavorable execution plans.
AddendumHere is the code I used in Oracle 12c.create table customer (
cust_no numeric not null,
cust_nm VARCHAR2(100),
register_date timestamp(0),
register_dt varchar2(8),
cust_status_cd varchar2(1),
register_channel_cd varchar2(1),
cust_age numeric(3),
active_yn varchar2(1),
sigungu_cd varchar2(5),
sido_cd varchar2(2)
);
insert into customer
select rownum, chr(65+mod(rownum,26))||to_char(rownum)||'CUST_NM'
, sysdate - mod(rownum,10000)
, to_char((sysdate - mod(rownum,10000)),'yyyymmdd') as register_dt
, mod(rownum,5)+1 as cust_status_cd
, mod(rownum,2)+1 as register_channel_cd
, trunc(dbms_random.value(1,100)) as age
, case when mod(rownum,22) = 0 then 'N' else 'Y' end as active_yn
, case when mod(rownum,1000) = 0 then '11007'
when mod(rownum,1000) = 1 then '11006'
when mod(rownum,1000) = 2 then '11005'
when mod(rownum,1000) = 3 then '11004'
when mod(rownum,1000) = 4 then '11003'
when mod(rownum,1000) = 5 then '11002'
else '11001' end as sigungu_cd
, case when mod(rownum,4) = 0 then '01'
when mod(rownum,4) = 1 then '02'
else '03' end as sido_cd
from XMLTABLE('1 to 1000000');
create table com_code (
group_cd varchar2(10),
cd varchar2(10),
cd_nm varchar2(100));
insert into com_code values ('G1','11001','SEOUL')
,('G1','11002','PUSAN')
,('G1','11003','INCHEON')
,('G1','11004','DAEGU')
,('G1','11005','JAEJU')
,('G1','11006','ULEUNG')
,('G1','11007','ETC');
insert into com_code values ('G2','1','Infant')
,('G2','2','Child')
,('G2','3','Adolescent')
,('G2','4','Adult')
,('G2','5','Senior');
insert into com_code values ('G3','01','Jeonbuk')
,('G3','02','Kangwon')
,('G3','03','Chungnam');
EXEC DBMS_STATS.GATHER_TABLE_STATS('','CUSTOMER');
EXEC DBMS_STATS.GATHER_TABLE_STATS('','COM_CODE');
SELECT DISTINCT A.CD, A.CD_NM
FROM COM_CODE A, CUSTOMER B
WHERE A.GROUP_CD ='G1'
AND A.CD = B.SIGUNGU_CD;
Here is the execution plan obtained by running the query in Oracle 12c.
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 |00:00:00.01 | 16 | | | |
|
1 | HASH UNIQUE | | 1 | 00:00:01
| 7 |00:00:00.01 | 16 | 886K| 886K| |
|* 2 | HASH JOIN SEMI | | 1 | 00:00:01 | 7 |00:00:00.01 | 16 | 1695K| 1695K| 1246K (0)|
|* 3 | TABLE ACCESS FULL| COM_CODE | 1 | 00:00:01 | 7 |00:00:00.01 | 7 | | | |
| 4 | TABLE ACCESS FULL| CUSTOMER | 1 | 00:00:01 | 907 |00:00:00.01 | 9 | | | |
-----------------------------------------------------------------------------------------------------------------------Addendum 2
Here are the different variations of the query and the hints I used during my experiments. I have also included the execution plans obtained by running the EXPLAIN command. All my efforts to get the optimzer to produce the Hash Semi Join operation was of no use.--I have moved the rest of this Addundum2 in the comment section due to the volume limitations of this bulletin board.