설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 162건, 최근 0 건
   

Function performance and volatility categories 2

글쓴이 : 모델광 날짜 : 2022-07-23 (토) 08:49 조회 : 1464

I was recently confronted with a nice example of how a lazy marking of function volatility can introduce a performance problem. This note will explain how you can get rid of the possibility of having poor performance with regard to using user defined functions.

When I drafted this note, the title of this note was "an optimizer bug in PostgreSQL". I had been an SQL tuner in Oracle database for a long time. When I witnessed a different behavior in PostgreSQL regarding its way of handling a user defined function, I thought I found a bug in PostgreSQL. However, while doing the test over and over again and reading the PostgreSQL manual related to function volatility, I found that PostgreSQL had its own specific way of handling user defined functions. I will share what I figured out in this note.


Recently while tuning an SQL statement I had a bizarre experience related to function volatility. There was an SQL statement I had to optimize. And the SQL statement was very long and complicated. The query in question had a lot of comments and user defined functions and in-line views (or subqueries). The query was slow because user defined functions had to be executed many times. And there was a user defined function in an in-line view, but the function result was not selected in the outermost SELECT list. When I asked the coder why he did not put the function result in the SELECT list, he told me that he did not comment out the function in the in-line view by mistake. What surprised me, however, was that the user defined function that was not selected in the outermost select list was being executed many times, which I thought was a PostgreSQL bug.


To make myself clear, here is a little demo script, with a couple of sample queries:


drop table if exists customer;

create table customer (

cust_no        numeric not null,

cust_nm        character varying(100),

register_date  timestamp(0),

register_dt    varchar(8),

cust_status_cd varchar(1),

register_channel_cd varchar(1),

cust_age       numeric(3),

active_yn      varchar(1),

sigungu_cd     varchar(5),

sido_cd        varchar(2)

);

insert into customer

select i, chr(65+mod(i,26))||i::text||'CUST_NM'

     , current_date - mod(i,10000)

     , to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt

     , mod(i,5)+1 as cust_status_cd

     , mod(i,3)+1 as register_channel_cd

     , trunc(random() * 100) +1 as age

     , case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn

     , case when mod(i,1000) = 0 then '11007'

            when mod(i,1000) = 1 then '11006'

            when mod(i,1000) = 2 then '11005'

            when mod(i,1000) = 3 then '11004'

            when mod(i,1000) = 4 then '11003'

            when mod(i,1000) = 5 then '11002'

            else '11001' end                  as sigungu_cd

      , case when mod(i,3) = 0 then '01'

             when mod(i,3) = 1 then '02'

             when mod(i,3) = 2 then '03' end as sido_cd

  from generate_series(1,1000000) a(i);

ALTER TABLE customer ADD CONSTRAINT customer_pk

  PRIMARY KEY (cust_no);

CREATE INDEX CUSTOMER_X01 ON CUSTOMER(sigungu_cd, register_date, cust_nm);

select * from pg_relation_size('customer');

93Mbytes


create table com_code (

group_cd  varchar(10),

cd        varchar(10),

cd_nm     varchar(100));

insert into com_code values ('G1','11001','SEOUL')

                           ,('G1','11002','PUSAN')

                           ,('G1','11003','INCHEON')

                           ,('G1','11004','DAEGU')

                           ,('G1','11005','JAEJU')

                           ,('G1','11006','ULEUNG')

                           ,('G1','11007','ETC');

insert into com_code values ('G2','1','Infant')

                           ,('G2','2','Child')

                           ,('G2','3','Adolescent')

                           ,('G2','4','Adult')

                           ,('G2','5','Senior');

insert into com_code values ('G3','01','Jeonbuk')

                           ,('G3','02','Kangwon')

                           ,('G3','03','Chungnam');

alter table com_code add constraint com_code_pk

  primary key (group_cd, cd);

select * from pg_relation_size('com_code');

--8k


create or replace function f_get_signgu_nm_sql(p_cd varchar)
returns varchar
language sql parallel safe
as $$
    SELECT CD_NM
      FROM COM_CODE
    WHERE GROUP_CD = 'G1'
         AND CD = p_cd;   --시군구 코드값 입력
$$


Here is the SQL statement that we have to investigate.


SELECT CUST_NO, CUST_NM, CUST_STATUS_CD
  FROM (SELECT CUST_NO, CUST_NM, CUST_STATUS_CD, CUST_AGE

                , f_get_signgu_nm_sql(SIGUNGU_CD) AS SIGUNGU_NAME
          FROM CUSTOMER) x
LIMIT 2;


In the in-line view x, there is a user defined function f_get_signgui_nm_sql and the column CUST_AGE. But the query omitted the function result and the CUST_AGE column in the outermost SELECT list. Do you think the optimizer will execute the function? In Oracle it is common sense that the optimizer does not execute the function because the query is not asking for the function result in the outermost SELECT list. But to my surprise PostgreSQL executed the function, which I had thought was an optimizer bug. Here is the output of the query when I executed it with EXPLAIN(verbose).


Limit  (cost=0.00..0.56 rows=2 width=22)
  Output: x.cust_no, x.cust_nm, x.cust_status_cd
  ->  Subquery Scan on x  (cost=0.00..281364.00 rows=1000000 width=22)
        Output: x.cust_no, x.cust_nm, x.cust_status_cd
        ->  Seq Scan on scott.customer  (cost=0.00..271364.00 rows=1000000 width=66)
              Output: customer.cust_no, customer.cust_nm, customer.cust_status_cd, NULL::numeric(3,0), f_get_signgu_nm_sql(customer.sigungu_cd)


If you are fresh from Oracle, you must have thought this PostgreSQL behavior is a bug. While writing this article I read the PostgreSQL manual carefully. Below is a passage from the manual:


A VOLATILE function can do anything, including modifying the database.

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement.

https://www.postgresql.org/docs/current/xfunc-volatility.html


When I read the above passage, I realized that I missed something important. Since I did not mention volatility when I created the f_get_signgu_nm_sql function, the function was VOLATILE. PostgreSQL does not know the function the way we know it. From the point of view of the PostgreSQL optimizer, the function could modify the database. So the optimizer had to execute the function even though there is no function result requested in the SELECT list. So I decided to inform the database engine that there is no possibility of modifying the database by marking the function STABLE. I rewrote the function like this:


create or replace function f_get_signgu_nm_sql(p_cd varchar)
returns varchar
language sql parallel safe stable
as $$
    SELECT CD_NM
      FROM COM_CODE
    WHERE GROUP_CD = 'G1'
         AND CD = p_cd;   --시군구 코드값 입력
$$


I re-ran the SQL statement and got this execution plan:


SELECT CUST_NO, CUST_NM, CUST_STATUS_CD
  FROM (SELECT CUST_NO, CUST_NM, CUST_STATUS_CD, CUST_AGE

                , f_get_signgu_nm_sql(SIGUNGU_CD) AS SIGUNGU_NAME
          FROM CUSTOMER) x
LIMIT 2;


Limit  (cost=0.00..0.04 rows=2 width=22)
  Output: customer.cust_no, customer.cust_nm, customer.cust_status_cd
  ->  Seq Scan on scott.customer  (cost=0.00..21364.00 rows=1000000 width=22)
        Output: customer.cust_no, customer.cust_nm, customer.cust_status_cd


Note that PostgreSQL collapsed the in-line view x (view merging in Oracle terminology) and removed the user defined function f_get_signgu_nm_sql and the column CUST_AGE because they are not requested in the SELECT list.


Conclusion

In general you can mark functions as VOLATILE, STABLE, or IMMUTABLE. STABLE and IMMUTABLE functions can not modify the database. They are not allowed to. When you put in a function in an inline view and it is marked VOLATILE, then Postgres thinks you wanted to run the function and the function may change the database, so PostgreSQL has to run the function. PostgreSQL can not optimize the function call away. This is an important thing to note with function volatility and in-line views (or subqueries). Watch your function volatility when you are making functions. If a function is marked VOLATILE and you put the function in a view, it will get run in an in-line view or subquery even if the function result is not requested.


Addendum

The following is another excerpt from the manual.


Because of this snapshotting behavior, a function containing only SELECT commands can safely be marked STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute all commands of a STABLE function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query.


My understanding of the manual is that if there are only SELECT commands in the function, you'd better mark the function STABLE. Do not be lazy.



   

postgresdba.com