PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 184건, 최근 1 건

Converting Oracle function to PostgreSQL

글쓴이 : 모델광 날짜 : 2021-10-03 (일) 06:52 조회 : 4558

Moving from Oracle to PorstgreSQL has become a common project, widely adopted by many who want to free themselves from license costs, expensive support costs and also technical limitations on the Oracle. However, if you don't have a thorough knowledge of how PostgreSQL works, you can't get a performance boost.

In this note I provide a method with which you can improve the performance of UDFs in PostgreSQL.

The following is a function code in Oracle.

CREATE OR REPLACE FUNCTION f_get_birthdate (

p_var varchar2(13))

RETURN VARCHAR2

p_birth_date VARCHAR2(8)

BEGIN

SELECT DECODE(SUBSTR(p_var,7,1),'1','19','2','19','3','20'

,'4','20','5','19','6','19','7','20','8','20','') || SUBSTR(p_var,1,6)

INTO p_birth_date

FROM DUAL;

RETURN (p_birth_date);

EXCEPTION

WHEN others THEN null;

END;

What the above code does is to return a birth date using a resident registration number.

Before we dive into a PostgreSQL function, let's take a moment to think about how we will improve the performance of the function in Oracle.

The first optimization technique that I can come up with is not to use the f_get_birthdate function. We can get a birth date without using the function in SQL statements. But many developers may not accept the idea because they have to change application SQL statements.

The second optimization technique that crosses my mind is to set the function PARALLEL_ENABLE. If we have to use a PL/SQL function, we need to check whether the function is safe to be executed by PX servers. This is required to prevent serialization points in the execution plan. The PARALLEL_ENABLE keyword tells the Oracle optimizer that the PX server may execute the function.

The following is a function code a PostgreSQL novice wrote.

CREATE OR REPLACE FUNCTION f_get_birthdate (

p_var varchar(13))

RETURNS VARCHAR

LANGUAGE PLPGSQL

DECLARE

p_birth_date VARCHAR(8);

BEGIN

SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'

WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'

ELSE '' END ||substr(p_var,1,6)

INTO p_birth_date;

EXCEPTION

WHEN others THEN null;

END;

$$;

Surely the best optimization strategy in every RDBMS is not to use the user defined function. But there are situations where you cannot remove a UDF. If you insist on the fundamental principles you may lose your job. So let's say that you cannot remove the above function. How will we improve the performance?

As I mentioned in the previous section, you may let the PostgreSQL optimizer know that the function is parallel-safe.

The following is the code I experimented with in order to check the performance.

drop table if exists t1;

create table t1 (c1 varchar(13), c2 char(200));

insert into t1

select '9004292042329', 'kkk'

from generate_series(1,1000000) a(i);

select f_get_birthdate(c1)

from t1;

QUERY PLAN

----------------------------------------------------------------------------------

Seq Scan on postgres_air.t1 (actual time=14.783..19269.096 rows=1000000 loops=1)

Output: f_get_birthdate(c1)

Buffers: shared hit=16260 read=15048

Planning:

Buffers: shared hit=18 read=2

Planning Time: 12.048 ms

Execution Time: 19552.021 ms

Now let's inform PostgreSQL that the function is parallel safe.

모델광

2021-10-03 (일) 07:50

CREATE OR REPLACE FUNCTION f_get_birthdate (
p_var varchar(13))
RETURNS VARCHAR
LANGUAGE PLPGSQL PARALLEL SAFE
AS
$$
DECLARE
p_birth_date VARCHAR(8);
BEGIN
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
ELSE '' END ||substr(p_var,1,6)
INTO p_birth_date;
--In order to make parallel operation kick in, you have to remove an exception block.
--EXCEPTION
-- WHEN others THEN null;
END;
$$;

select f_get_birthdate(c1)
from t1;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather (actual time=2.946..9992.977 rows=1000000 loops=1)
Output: (f_get_birthdate(c1))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16545 read=15119
-> Parallel Seq Scan on postgres_air.t1 (actual time=2.907..9420.217 rows=333333 loops=3)
Output: f_get_birthdate(c1)
Buffers: shared hit=16545 read=15119
Worker 0: actual time=3.585..9393.261 rows=331936 loops=1
Buffers: shared hit=5715 read=4862
Worker 1: actual time=3.736..9490.059 rows=322816 loops=1
Buffers: shared hit=5209 read=5083
Planning Time: 0.162 ms
Execution Time: 10144.747 ms

As you can see the elapsed time decreased from 19552 to 10144 msec. Is this all we can do? Definitely not. Here we can take advantage of PostgreSQL's advanced feature Other DBMSs don't have.

Below is the function code I rewrote where I chose to use an sql function.

CREATE OR REPLACE FUNCTION f_get_birthdate (
p_var varchar(13))
RETURNS varchar
LANGUAGE SQL PARALLEL SAFE
AS
$$
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
ELSE '' END ||substr(p_var,1,6)
$$;

select f_get_birthdate(c1)
from t1;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on postgres_air.t1 (actual time=0.089..1293.139 rows=1000000 loops=1)
Output: (CASE WHEN (substr((c1)::text, 7, 1) = ANY ('{1,2,5,6}'::text[])) THEN '19'::text WHEN (substr((c1)::text, 7,
1) = ANY ('{3,4,7,8}'::text[])) THEN '20'::text ELSE ''::text END || substr((c1)::text, 1, 6))
Buffers: shared hit=16206 read=15044
Planning:
Buffers: shared hit=6
Planning Time: 0.595 ms
Execution Time: 1364.441 ms

The elapsed time dropped from 10144 to 1364 msec. Note that there are no function calls here. The PostgreSQL optimizer removed the function and rewrote it into a pure SQL statement.

Conclusion
In an Oracle to PostgreSQL migration project it is critical to take advantage of PostgreSQL features. When you don't need any procedural elements or variables in a UDF, it is better to adopt an sql function. Removing function calls in an SQL statement can save a lot of load.

댓글주소

select f_get_birthdate(c1)
  from t1;
                                          QUERY PLAN
-----------------------------------------------------------------------------------
 Gather (actual time=2.946..9992.977 rows=1000000 loops=1)
   Output: (f_get_birthdate(c1))
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=16545 read=15119
   ->  Parallel Seq Scan on postgres_air.t1 (actual time=2.907..9420.217 rows=333333 loops=3)
         Output: f_get_birthdate(c1)
         Buffers: shared hit=16545 read=15119
         Worker 0:  actual time=3.585..9393.261 rows=331936 loops=1
           Buffers: shared hit=5715 read=4862
         Worker 1:  actual time=3.736..9490.059 rows=322816 loops=1
           Buffers: shared hit=5209 read=5083
 Planning Time: 0.162 ms
 Execution Time: 10144.747 ms

As you can see the elapsed time decreased from 19552 to 10144 msec. Is this all we can do? Definitely not. Here we can take advantage of PostgreSQL's advanced feature Other DBMSs don't have.

Below is the function code I rewrote where I chose to use an sql function.

CREATE OR REPLACE FUNCTION  f_get_birthdate (
p_var varchar(13))
RETURNS varchar
LANGUAGE SQL PARALLEL SAFE
AS
$$
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
                  WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
                  ELSE '' END ||substr(p_var,1,6)
$$;

select f_get_birthdate(c1)
  from t1;
                                 QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on postgres_air.t1 (actual time=0.089..1293.139 rows=1000000 loops=1)
   Output: (CASE WHEN (substr((c1)::text, 7, 1) = ANY ('{1,2,5,6}'::text[])) THEN '19'::text WHEN (substr((c1)::text, 7,
 1) = ANY ('{3,4,7,8}'::text[])) THEN '20'::text ELSE ''::text END || substr((c1)::text, 1, 6))
   Buffers: shared hit=16206 read=15044
 Planning:
   Buffers: shared hit=6
 Planning Time: 0.595 ms
 Execution Time: 1364.441 ms

The elapsed time dropped from  10144 to 1364 msec. Note that there are no function calls here. The PostgreSQL optimizer removed the function and rewrote it  into a pure SQL statement.

Conclusion
In an Oracle to PostgreSQL migration project it is critical to take advantage of PostgreSQL features. When you don't need any procedural elements or variables in a UDF, it is better to adopt an sql function. Removing function calls in an SQL statement can save a lot of load.

E-MAIL: root@postgresdba.com