설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 81건, 최근 0 건
   

Converting Oracle function to PostgreSQL

글쓴이 : 모델광 날짜 : 2021-10-03 (일) 06:52 조회 : 53
Moving from Oracle to PorstgreSQL has become a common project, widely adopted by many who want to free themselves from license costs, expensive support costs and also technical limitations on the Oracle. However, if you don't have a thorough knowledge of how PostgreSQL works, you can't get a performance boost.
In this note I provide a method with which you can improve the performance of UDFs in PostgreSQL.

The following is a function code in Oracle.

CREATE OR REPLACE FUNCTION f_get_birthdate (
p_var varchar2(13))
RETURN VARCHAR2
IS
  p_birth_date   VARCHAR2(8)
BEGIN
  SELECT DECODE(SUBSTR(p_var,7,1),'1','19','2','19','3','20'
                        ,'4','20','5','19','6','19','7','20','8','20','') || SUBSTR(p_var,1,6)
     INTO p_birth_date
     FROM DUAL;
   RETURN (p_birth_date);
EXCEPTION
   WHEN others THEN null;
END;
/

What the above code does is to return a birth date using a resident registration number.
Before we dive into a PostgreSQL function, let's take a moment to think about how we will improve the performance of the function in Oracle.
The first optimization technique that I can come up with is not to use the f_get_birthdate function. We can get a birth date without using the function in SQL statements. But many developers may not accept the idea because they have to change application SQL statements.
The second optimization technique that crosses my mind is to set the function PARALLEL_ENABLE. If we have to use a PL/SQL function, we need to check whether the function is safe to be executed by PX servers. This is required to prevent serialization points in the execution plan. The PARALLEL_ENABLE keyword tells the Oracle optimizer that the PX server may execute the function.

The following is a function code a PostgreSQL novice wrote.

CREATE OR REPLACE FUNCTION  f_get_birthdate (
p_var varchar(13))
RETURNS VARCHAR
LANGUAGE PLPGSQL
AS
$$
DECLARE
p_birth_date   VARCHAR(8);
BEGIN
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
                  WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
                  ELSE '' END ||substr(p_var,1,6)
   INTO p_birth_date;
EXCEPTION
   WHEN others THEN null;
END;
$$;

Surely the best optimization strategy in every RDBMS is not to use the user defined function. But there are situations where you cannot remove a UDF. If you insist on the fundamental principles you may lose your job. So let's say that you cannot remove the above function. How will we improve the performance?
As I mentioned in the previous section, you may let the PostgreSQL optimizer know that the function is parallel-safe.

The following is the code I experimented with in order to check the performance.

drop table if exists t1;
create table t1 (c1 varchar(13), c2 char(200));

insert into t1
select '9004292042329', 'kkk'
  from generate_series(1,1000000) a(i);

select f_get_birthdate(c1)
    from t1;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Seq Scan on postgres_air.t1 (actual time=14.783..19269.096 rows=1000000 loops=1)
   Output: f_get_birthdate(c1)
   Buffers: shared hit=16260 read=15048
 Planning:
   Buffers: shared hit=18 read=2
 Planning Time: 12.048 ms
 Execution Time: 19552.021 ms

Now let's inform PostgreSQL that the function is parallel safe.

모델광 2021-10-03 (일) 07:50
CREATE OR REPLACE FUNCTION  f_get_birthdate (
p_var varchar(13))
RETURNS VARCHAR
LANGUAGE PLPGSQL PARALLEL SAFE
AS
$$
DECLARE
p_birth_date  VARCHAR(8);
BEGIN
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
                  WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
                  ELSE '' END ||substr(p_var,1,6)
  INTO p_birth_date;
--In order to make parallel operation kick in, you have to remove an exception block.
--EXCEPTION
--  WHEN others THEN null;
END;
$$;

select f_get_birthdate(c1)
  from t1;
                                          QUERY PLAN
-----------------------------------------------------------------------------------
 Gather (actual time=2.946..9992.977 rows=1000000 loops=1)
  Output: (f_get_birthdate(c1))
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=16545 read=15119
  ->  Parallel Seq Scan on postgres_air.t1 (actual time=2.907..9420.217 rows=333333 loops=3)
        Output: f_get_birthdate(c1)
        Buffers: shared hit=16545 read=15119
        Worker 0:  actual time=3.585..9393.261 rows=331936 loops=1
          Buffers: shared hit=5715 read=4862
        Worker 1:  actual time=3.736..9490.059 rows=322816 loops=1
          Buffers: shared hit=5209 read=5083
 Planning Time: 0.162 ms
 Execution Time: 10144.747 ms

As you can see the elapsed time decreased from 19552 to 10144 msec. Is this all we can do? Definitely not. Here we can take advantage of PostgreSQL's advanced feature Other DBMSs don't have.

Below is the function code I rewrote where I chose to use an sql function.

CREATE OR REPLACE FUNCTION  f_get_birthdate (
p_var varchar(13))
RETURNS varchar
LANGUAGE SQL PARALLEL SAFE
AS
$$
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
                  WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
                  ELSE '' END ||substr(p_var,1,6)
$$;

select f_get_birthdate(c1)
  from t1;
                                QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on postgres_air.t1 (actual time=0.089..1293.139 rows=1000000 loops=1)
  Output: (CASE WHEN (substr((c1)::text, 7, 1) = ANY ('{1,2,5,6}'::text[])) THEN '19'::text WHEN (substr((c1)::text, 7,
 1) = ANY ('{3,4,7,8}'::text[])) THEN '20'::text ELSE ''::text END || substr((c1)::text, 1, 6))
  Buffers: shared hit=16206 read=15044
 Planning:
  Buffers: shared hit=6
 Planning Time: 0.595 ms
 Execution Time: 1364.441 ms

The elapsed time dropped from  10144 to 1364 msec. Note that there are no function calls here. The PostgreSQL optimizer removed the function and rewrote it  into a pure SQL statement.

Conclusion
In an Oracle to PostgreSQL migration project it is critical to take advantage of PostgreSQL features. When you don't need any procedural elements or variables in a UDF, it is better to adopt an sql function. Removing function calls in an SQL statement can save a lot of load.
댓글주소
   

postgresdba.com