설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 163건, 최근 0 건
   

Evaluating each index entry against a function

글쓴이 : 모델광 날짜 : 2023-08-27 (일) 10:55 조회 : 441
This is one of the PostgreSQL flaws I have never noticed before. This realization came while I was investigating a query after migrating it from Oracle.  PostgreSQL keeps growing so fast it's possible to fix this inefficiency in the near future. Here's a little example to show what I mean - first the data set:

Here is a table definition, and a query that performs slower in PostgreSQL 15.1 than in Oracle 12c.

create table customer (
cust_no        numeric not null,
cust_nm        character varying(100),
register_date  timestamp(0),
register_dt    varchar(8),
cust_status_cd varchar(1),
register_channel_cd varchar(1),
cust_age       smallint,
active_yn      varchar(1),
sigungu_cd     varchar(5),
sido_cd        varchar(2)
);

insert into customer
select i, chr(65+mod(i,26))||i::text||'CUST_NM'
     , current_date - mod(i,10000)
     , to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt
     , mod(i,5)+1 as cust_status_cd
     , mod(i,2)+1 as register_channel_cd
     , trunc(random() * 100) +1 as age
     , case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn
     , case when mod(i,1000) = 0 then '11007'
            when mod(i,1000) = 1 then '11006'
            when mod(i,1000) = 2 then '11005'
            when mod(i,1000) = 3 then '11004'
            when mod(i,1000) = 4 then '11003'
            when mod(i,1000) = 5 then '11002'
            else '11001' end                  as sigungu_cd
      , case when mod(i,4) = 0 then '01'
             when mod(i,4) = 1 then '02'
             else                   '03' end as sido_cd
  from generate_series(1,1000000) a(i);

CREATE INDEX CUSTOMER_X02 ON CUSTOMER (REGISTER_DT, CUST_NM);

SELECT *
  FROM CUSTOMER A
 WHERE REGISTER_DT LIKE '2021%'
   AND UPPER(CUST_NM) = 'K400CUST_NM';

The query uses the first and second columns of the index, but wraps the second column in an UPPER function. Because of the conversion of the CUST_NM column, the optimizer will not utilize it as an access path, but the question is - will the optimizer use the UPPER() predicate as a filter predicate while doing the index scan?

Here is the execution plan generated in PostgreSQL 15.1.


Bitmap Heap Scan on customer a (actual time=1.900..11.433 rows=1 loops=1)
  Filter: (((register_dt)::text ~~ '2021%'::text) AND (upper((cust_nm)::text) = 'K400CUST_NM'::text))
  Rows Removed by Filter: 36499
  Heap Blocks: exact=518
  Buffers: shared hit=701
  ->  Bitmap Index Scan on customer_x02 (actual time=1.830..1.831 rows=36500 loops=1)
        Index Cond: (((register_dt)::text >= '2021'::text) AND ((register_dt)::text < '2022'::text))
        Buffers: shared hit=183
Planning Time: 0.102 ms
Execution Time: 11.457 ms


It is noteworthy that even though the CUST_NM column is included in the index, PostgreSQL postpones the UPPER() predicate test until it accesses the table itself. Consequently, PostgreSQL does a Bitmap Index Scan over the whole section of the index where REGISTER_DT LIKE '2021%', retrieving 36,500 rows. Then it accesses the table to test the UPPER() funtion, filtering out 36,499 rows. While accessing the table, it has to visit 618 (701-183) blocks. At fist glance, this plan may not seem inefficient, and I shared the same perspective. However, when I checked the execution plan in Oracle 12c, I discoverd why the query was slower in PostgreSQL.

Let's examine how Oracle 12c approaches the query.


SELECT *
  FROM CUSTOMER A
 WHERE REGISTER_DT LIKE '2021%'
   AND UPPER(CUST_NM) = 'K400CUST_NM';

-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Starts | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |      1 |      1 |00:00:00.01 |     183 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER     |      1 |      1 |00:00:00.01 |     183 |
|*  2 |   INDEX RANGE SCAN                  | CUSTOMER_X02 |      1 |      1 |00:00:00.01 |     182 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(\"REGISTER_DT\" LIKE '2021%')
filter((\"REGISTER_DT\" LIKE '2021%' AND UPPER(\"CUST_NM\")='K400CUST_NM'))

The UPPER() test is used during the index range scan. This means Oracle will do an index range scan over the whole section of the index where REGISTER_DT LIKE '2021%', testing every index entry it finds against the UPPER() predicate. As a result, the optimizer incurs just one block I/O (183-182) when accessing the table.

Closing Comments
I regret to say that this note concludes with a sense of frustration because I couldn't reduce the block I/O in PostgreSQL - it could be due to the abscence of a viable solution or perhaps because I lack advanced knowledge in PostgreSQL.

   

postgresdba.com