설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 178건, 최근 0 건
   

NVL in EDB and Oracle

글쓴이 : 모델광 날짜 : 2022-09-09 (금) 22:02 조회 : 3678

This note is only for those who use Enterprise DB or Oracle.

If you are concerned with vanilla PostgreSQL, you are not advised to read this note. If you use Oracle or Enterprised DB, this article is worth reading.


Now I am working as a DBA in a project where we use EDB and I am frequently confronted with SQL statements which use the NVL function. NVL is just an old Oracle syntax for COALESCE and Oracle still retains NVL for backward compatibility. EDB seems to implement the NVL function to make migration from Oracle smooth. But the NVL function has two serious drawbacks.  

In this post I will give you the reason why you should avoid using the NVL function in Oracle or EDB.


Here is some code to create the test data set:


--tested on PostgreSQL 14.4 (EnterpriseDB Advanced Server 14.4.0)

create table ORDERS (ord_no bigint,cust_id varchar(20),comment varchar(100),ord_date varchar(8));

ALTER TABLE ORDERS ADD CONSTRAINT PK_ORDERS PRIMARY KEY(ORD_NO);

CREATE TABLE ORDERS_DETAIL(ORD_LINE_NO BIGINT NOT NULL,ORD_NO BIGINT NOT NULL,PROD_ID VARCHAR(10) NOT NULL,COMMENT VARCHAR(100),ORD_AMT BIGINT);

ALTER TABLE ORDERS_DETAIL ADD CONSTRAINT PK_ORDERS_DETAIL PRIMARY KEY(ORD_LINE_NO);

CREATE INDEX ORDERS_DETAIL_X01 ON ORDERS_DETAIL(ORD_NO, PROD_ID);

CREATE INDEX ORDERS_DETAIL_X02 ON ORDERS_DETAIL(ORD_NO, ORD_AMT);


insert into ORDERS
select i as ord_no    
        , 'C'||mod(i,10) as cust_id    
        , lpad('X',10,'Y') as comment     
        , to_char(to_date('20191001','YYYYMMDD')+mod(i,60),'yyyymmdd') as ord_date
  from generate_series(1,1000000) a(i);


INSERT INTO ORDERS_DETAIL
SELECT i as ORD_LINE_NO   
          , mod(i,1000000) AS ORD_NO   
          , 'PP'||MOD(i,5) AS PROD_ID   
          , lpad('X',10,'Y') as comment
      , case when i < 1000 then i*100 else i end as prod_amt
  FROM generate_series(1,10000000) a(i);


I have created the ORDERS and ORDERS_DETAIL table with the size of 65Mbyters and 650Mbytes respectively. For testing purposes I have run the following SQL statement.


SELECT ORD_NO
            ,CUST_ID
            ,NVL(A.COMMENT,(SELECT MAX(PROD_ID)

                                               FROM ORDERS_DETAIL B WHERE A.ORD_NO = B.ORD_NO))
  FROM ORDERS A
 WHERE ORD_NO <= 100000;


Note that I have used the NVL function in the SELECT list. And we know that the COMMENT column has no NULL values.

Think for a moment; Will EDB run the scalar subquery even though there are no NULL values in the COMMENT column? For your reference, if you run the above query in Oracle, Oracle executes the scalar subquery, which is inefficient.


Here is the plan I got when I ran the query under EDB 14.4:


Index Scan using pk_orders on scott.orders a  (cost=0.42..84258.58 rows=98893 width=43) (actual time=0.048..199.752 rows=100000 loops=1)
  Output: a.ord_no, a.cust_id, nvl(a.comment, ((SubPlan 2))::character varying)
  Index Cond: (a.ord_no <= 100000)
  Buffers: shared hit=302546
  SubPlan 2
    ->  Result  (cost=0.80..0.81 rows=1 width=32) (actual time=0.001..0.002 rows=1 loops=100000)
          Output: $1
          Buffers: shared hit=301534
          InitPlan 1 (returns $1)
            ->  Limit  (cost=0.43..0.80 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=100000)
                  Output: ((b.prod_id)::text)
                  Buffers: shared hit=301534
                  ->  Index Only Scan Backward using orders_detail_x01 on scott.orders_detail b  (cost=0.43..4.13 rows=10 width=32) (actual time=0.001..0.001 rows=1 loops=100000)
                        Output: b.prod_id
                        Index Cond: ((b.ord_no = a.ord_no) AND (b.prod_id IS NOT NULL))
                        Heap Fetches: 0
                        Buffers: shared hit=301534
Query Identifier: 8902623697392898361
Planning Time: 0.222 ms
Execution Time: 203.987 ms


The number of block I/Os ( shared hit=301534) in the  Index Only Scan node tells us that the scalar subquery was executed 100000 times ( loops=100000), which is bad.


Now let's replace the NVL fuction with the COALESCE function and run the query.


SELECT ORD_NO
            ,CUST_ID
            ,COALESCE(A.COMMENT,(SELECT MAX(PROD_ID)

                                                          FROM ORDERS_DETAIL B WHERE A.ORD_NO = B.ORD_NO))
  FROM ORDERS A
 WHERE ORD_NO <= 100000;


And here is the query plan it generates:


Index Scan using pk_orders on scott.orders a  (cost=0.42..84011.35 rows=98893 width=43) (actual time=0.017..12.163 rows=100000 loops=1)
  Output: a.ord_no, a.cust_id, COALESCE(a.comment, ((SubPlan 2))::character varying)
  Index Cond: (a.ord_no <= 100000)
  Buffers: shared hit=1012
  SubPlan 2
    ->  Result  (cost=0.80..0.81 rows=1 width=32) (never executed)
          Output: $1
          InitPlan 1 (returns $1)
            ->  Limit  (cost=0.43..0.80 rows=1 width=32) (never executed)
                  Output: ((b.prod_id)::text)
                  ->  Index Only Scan Backward using orders_detail_x01 on scott.orders_detail b  (cost=0.43..4.13 rows=10 width=32) (never executed)
                        Output: b.prod_id
                        Index Cond: ((b.ord_no = a.ord_no) AND (b.prod_id IS NOT NULL))
                        Heap Fetches: 0
Query Identifier: -7418638523580744970
Planning Time: 0.182 ms
Execution Time: 14.226 ms


Note that the orders_detail_x01 index access was not executed. Look at the elapsed time and block IO differences. The elapsed time dropped from 203 to 13 ms and the block IO fell from 302546 to 1012. In Oracle you can see the same behavior. When I see this pattern of queries in Oracle, I can improve performance simply by substituting NVL with COALESCE.


There is another reason why you must not use NVL in EDB. The reason is smple:

NVL is less performant than COALESCE.


We can check the performance difference by running the following query:


SELECT ORD_NO
      ,CUST_ID
      ,COALESCE(A.COMMENT,'Z') as COMMENT
  FROM ORDERS A
 WHERE ORD_NO <= 1000000
UNION ALL
SELECT ORD_NO
      ,CUST_ID
      ,NVL(A.COMMENT,'Z') as COMMENT
  FROM ORDERS A
 WHERE ORD_NO <= 1000000;


 Append  (cost=0.00..72206.00 rows=2000000 width=43) (actual time=0.011..310.744 rows=2000000 loops=1)
  Buffers: shared hit=14706
  ->  Seq Scan on scott.orders a  (cost=0.00..19853.00 rows=1000000 width=43) (actual time=0.010..101.993 rows=1000000 loops=1)
        Output: a.ord_no, a.cust_id, COALESCE(a.comment, 'Z'::character varying)
        Filter: (a.ord_no <= 1000000)
        Buffers: shared hit=7353
  ->  Seq Scan on scott.orders a_1  (cost=0.00..22353.00 rows=1000000 width=43) (actual time=0.018..131.158 rows=1000000 loops=1)
        Output: a_1.ord_no, a_1.cust_id, nvl(a_1.comment, 'Z'::character varying)
        Filter: (a_1.ord_no <= 1000000)
        Buffers: shared hit=7353
Query Identifier: 7906406971189728346
Planning:
  Buffers: shared hit=8
Planning Time: 0.155 ms
Execution Time: 353.892 ms


Note that when we use COALESCE the planner's estimated cost for the upper branch of the query is 19853, whereas the cost for the lower branch of the query is 22353.  PostgreSQL query planner is predicting that it will take longer to execute the lower branch of the query using NVL. We can see that the planner's estimate was correct by checking the actual execution times of each branch of the query, which are 101.993 ms and 131.158 ms respectively.


This inefficiency is just  the tip of the iceberg. There are a lot of side effects you have to keep in mind when you use Oracle specific features supported by EDB. I will provide them in another note.


Conclusion

In Oracle or EDB you should not use NVL. Please get into the habit of using COALESCE regardless of how annoying it is to type.

I started my Oracle career on version 7.3 and NVL was the only function available to filter through null expressions. Then Oracle 9i came in with the COALESCE function. To this day when we use Oracle 21 or something, I still encounter developers who utilize NVL, which I think is the reason why EDB decided to implement the feature. The fact that you have the luxury of being able to use NVL in your non-Oracle database does not mean that the function is optimal. COALESCE is more performant. Why keep using the ancient and inefficient function?


Footnote

You can also replace NVL by the CASE WHEN clause like the following:


SELECT ORD_NO
            ,CUST_ID
            ,CASE WHEN A.COMMENT IS NOT NULL THEN A.comment  
               ELSE (SELECT MAX(PROD_ID) FROM ORDERS_DETAIL B WHERE A.ORD_NO = B.ORD_NO) END
  FROM ORDERS A
 WHERE ORD_NO <= 100000;


The thing is that the SQL statement with the COALESCE function is slightly (1~2 %) faster than the SQL statement with the CASE clause and I do not know the reason.


PostgresDBA 2022-09-15 (목) 23:08
ORACLE 의  NVL 은 COALESCE 로 100% 대체 가능합니다.

EDB 의 NVL  와 COALESCE 는 결과가 다를수 있으니 주의하세요.
NVL('','a') => 'a' 반환
COALESCE('','a') => '' 반환
댓글주소
     
     
모델광 2022-09-18 (일) 13:36
Thanks for the comment - excellent information. There are always special cases and boundary conditions that produce unexpected results and I have to admit that I made a terrible mistake of not addressing the difference of NVL and COALESCE in EDB when I put up this article.
In Oracle they have the same functionality, but in EDB they are slightly different. I have come across many cases where application developers are in trouble because they do not think about the way of NVL handling the empty string('') in EDB.
댓글주소
모델광 2023-10-27 (금) 22:23
About one year after publishing this note, I realized that I was incorrect when I stated that NVL was simply an old Oracle syntax for COALESCE and Oracle still retains NVL for backward compatibility.

In Oracle NVL and COALESCE are functioally identical. However, their internal implementations differ. When you use NVL in the WHERE clause, the optimizer can utilize statistics on the column. Still, when you use COALESCE in the WHERE clause, the optimizer can not leverage statistics on the column.
Recently I experienced dramas after replacing NVL with COALESCE in Oracle.
댓글주소
   

postgresdba.com