This is one of the PostgreSQL flaws I have never noticed before. This realization came while I was investigating a query after migrating it from Oracle. PostgreSQL keeps growing so fast it's possible to fix this inefficiency in the near future. Here's a little example to show what I mean - first the data set:
Here is a table definition, and a query that performs slower in PostgreSQL 15.1 than in Oracle 12c.
create table customer (
cust_no numeric not null,
cust_nm character varying(100),
register_date timestamp(0),
register_dt varchar(8),
cust_status_cd varchar(1),
register_channel_cd varchar(1),
cust_age smallint,
active_yn varchar(1),
sigungu_cd varchar(5),
sido_cd varchar(2)
);
insert into customer
select i, chr(65+mod(i,26))||i::text||'CUST_NM'
, current_date - mod(i,10000)
, to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt
, mod(i,5)+1 as cust_status_cd
, mod(i,2)+1 as register_channel_cd
, trunc(random() * 100) +1 as age
, case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn
, case when mod(i,1000) = 0 then '11007'
when mod(i,1000) = 1 then '11006'
when mod(i,1000) = 2 then '11005'
when mod(i,1000) = 3 then '11004'
when mod(i,1000) = 4 then '11003'
when mod(i,1000) = 5 then '11002'
else '11001' end as sigungu_cd
, case when mod(i,4) = 0 then '01'
when mod(i,4) = 1 then '02'
else '03' end as sido_cd
from generate_series(1,1000000) a(i);
CREATE INDEX CUSTOMER_X02 ON CUSTOMER (REGISTER_DT, CUST_NM);
SELECT *
FROM CUSTOMER A
WHERE REGISTER_DT LIKE '2021%'
AND UPPER(CUST_NM) = 'K400CUST_NM';
The query uses the first and second columns of the index, but wraps the second column in an UPPER function. Because of the conversion of the CUST_NM column, the optimizer will not utilize it as an access path, but the question is - will the optimizer use the UPPER() predicate as a filter predicate while doing the index scan?
Here is the execution plan generated in PostgreSQL 15.1.Bitmap Heap Scan on customer a (actual time=1.900..11.433 rows=1 loops=1)
Filter: (((register_dt)::text ~~ '2021%'::text) AND (upper((cust_nm)::text) = 'K400CUST_NM'::text))
Rows Removed by Filter: 36499
Heap Blocks: exact=518
Buffers: shared hit=701
-> Bitmap Index Scan on customer_x02 (actual time=1.830..1.831 rows=36500 loops=1)
Index Cond: (((register_dt)::text >= '2021'::text) AND ((register_dt)::text < '2022'::text))
Buffers: shared hit=183
Planning Time: 0.102 ms
Execution Time: 11.457 msIt is noteworthy that even though the CUST_NM column is included in the index, PostgreSQL postpones the UPPER() predicate test until it accesses the table itself. Consequently, PostgreSQL does a Bitmap Index Scan over the whole section of the index where REGISTER_DT LIKE '2021%', retrieving 36,500 rows. Then it accesses the table to test the UPPER() funtion, filtering out 36,499 rows. While accessing the table, it has to visit 618 (701-183) blocks. At fist glance, this plan may not seem inefficient, and I shared the same perspective. However, when I checked the execution plan in Oracle 12c, I discoverd why the query was slower in PostgreSQL.
Let's examine how Oracle 12c approaches the query.SELECT *
FROM CUSTOMER A
WHERE REGISTER_DT LIKE '2021%'
AND UPPER(CUST_NM) = 'K400CUST_NM';
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1 |00:00:00.01 | 183 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER | 1 | 1 |00:00:00.01 | 183 |
|* 2 | INDEX RANGE SCAN | CUSTOMER_X02 | 1 | 1 |00:00:00.01 | 182 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(\"REGISTER_DT\" LIKE '2021%')
filter((\"REGISTER_DT\" LIKE '2021%' AND UPPER(\"CUST_NM\")='K400CUST_NM'))
The UPPER() test is used during the index range scan. This means Oracle will do an index range scan over the whole section of the index where REGISTER_DT LIKE '2021%', testing every index entry it finds against the UPPER() predicate. As a result, the optimizer incurs just one block I/O (183-182) when accessing the table.
Closing Comments
I regret to say that this note concludes with a sense of frustration because I couldn't reduce the block I/O in PostgreSQL - it could be due to the abscence of a viable solution or perhaps because I lack advanced knowledge in PostgreSQL.