설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 162건, 최근 0 건
   

row-by-row solution vs. set-based solution

글쓴이 : 모델광 날짜 : 2022-02-20 (일) 09:23 조회 : 1513

Last year I worked on a project where I had to port Oracle functions to PostgreSQL functions. The following is one of the code patterns I refactored in order to improve its performance.

I have never worked as a programmer, though, it seems that programming languages do not even have the concept of relational operators and set-based techniques. So engineers are good at developing row-by-row algorithm (former Oacle consultant Tom Kyte called this slow-by-slow algorithm). However, relational databases go against row-by-row processing.

Manipulation of data as a set is essential for dealing with data in a relational database because row-by-row(slow-by-slow) processing does not exploit system resources efficiently. In order to convert serial programs into multi-process parallel programs we need to use good set-based mechanics.

Set-based techniques not only are faster in terms of performance, using less system resources, and more efficient, but it makes your whole business process more robust, and less frightening and take the risk out of manipulating and managing large data sets. It is a case of just learning the right tools to do the right job.


Below are the original Oracle procedure and the ported PostgreSQL function.

The SQL statement itself in the UDF makes no sense from an application point of view, but it should illustrate the performance difference of row-by-row processing and set-based processing. 


* Oracle Procedure in an AS-IS database

CREATE OR REPLACE PROCEDURE P_SUM_AMT(

         p_empno      in number

        ,v_amt_total out number)

IS

  v_amount  NUMBER;

  v_ord_no   NUMBER;

  v_amount  NUMBER;

 

BEGIN

  v_amt_total := 0   ;

  v_amount    := 0   ;

  CURSOR C1 IS

       SELECT ORD_NO

         FROM OFFLINE_ORDER

       WHERE EMPNO = p_empno;

      

  OPEN C1;

  LOOP

      FETCH C1 INTO v_ord_no                      --take a row

      EXIT WHEN c1%NOTFOUND;

      

      SELECT NVL(UNIT_PRICE*QUANTY,0)       --initiate another query

        INTO v_amount

        FROM ORD_ITEM

      WHERE ORD_NO = v_ord_no;

     

      v_amt_total := v_amt_total + v_amount;    --store a vaule summing up

 

  END LOOP;

 CLOSE C1;


END;


The inefficiency in the query above is: it is very often a PL/SQL loop. They take a row and then they bacically initiate another query using the contents of that row to get the result. Sometimes they have to store the result in a variable and do another operation, for example, summing up. Some DBAs call this slow-by-slow processing which incurs a lot of block IOs and latch(locking) related wait events.

On the other hand in a query exploiting set-based processing we can get the result in one query. The following is a function which uses a set-based algorithm.


* PostgreSQL Procedure (written on version 13) in a TO-BE database

In PostgreSQL 13, a procedure does not support OUT mode. So I created a function instead of a procedure. Here I created a SQL function, not a PL/pgSQL function because a SQL function is more efficient than a PL/pgSQL function. I will show the performance difference in another post.


CREATE OR REPLACE FUNCTION P_SUM_AMT

                                       (p_empno  numeric)

RETURNS NUMERIC

LANGUAGE SQL PARALLEL SAFE

AS

$BODY$

 

SELECT COALESCE(SUM(AMOUNT),0)

  FROM OFFLINE_ORDER A

     , LATERAL (SELECT UNIT_PRICE*QUANTITY AS AMOUNT

                          FROM ORD_ITEM

                       WHERE ORD_NO = A.ORD_NO) B

 WHERE 1=1

      AND A.EMPNO = p_empno;

 

$BODY$


It may be argued that the ported code is harder to write. However, there is huge performance gains. The programmers, once they have learned to exploit the set-based techniques, and they see the performance gains, they can actually start to innovate business processes to run semi-real time, or much faster and get rid of those ludicrous functions with row-by-row FOR ~ LOOP logic that consumes a lot of system resources.


Conclusion

Try to to get the required result in one go. Try to exploit the set-based techniques which are suitable for relational databases. When it comes to operating on data in bulk, row-by-row processing can be a disaster because it consumes a lot of system resources.


   

postgresdba.com