Moving from Oracle to PorstgreSQL has become a common project, widely adopted by many who want to free themselves from license costs, expensive support costs and also technical limitations on the Oracle. However, if you don't have a thorough knowledge of how PostgreSQL works, you can't get a performance boost.
In this note I provide a method with which you can improve the performance of UDFs in PostgreSQL.
The following is a function code in Oracle.
CREATE OR REPLACE FUNCTION f_get_birthdate (
p_var varchar2(13))
RETURN VARCHAR2
IS
p_birth_date VARCHAR2(8)
BEGIN
SELECT DECODE(SUBSTR(p_var,7,1),'1','19','2','19','3','20'
,'4','20','5','19','6','19','7','20','8','20','') || SUBSTR(p_var,1,6)
INTO p_birth_date
FROM DUAL;
RETURN (p_birth_date);
EXCEPTION
WHEN others THEN null;
END;
/
What the above code does is to return a birth date using a resident registration number.
Before we dive into a PostgreSQL function, let's take a moment to think about how we will improve the performance of the function in Oracle.
The first optimization technique that I can come up with is not to use the f_get_birthdate function. We can get a birth date without using the function in SQL statements. But many developers may not accept the idea because they have to change application SQL statements.
The second optimization technique that crosses my mind is to set the function PARALLEL_ENABLE. If we have to use a PL/SQL function, we need to check whether the function is safe to be executed by PX servers. This is required to prevent serialization points in the execution plan. The PARALLEL_ENABLE keyword tells the Oracle optimizer that the PX server may execute the function.
The following is a function code a PostgreSQL novice wrote.
CREATE OR REPLACE FUNCTION f_get_birthdate (
p_var varchar(13))
RETURNS VARCHAR
LANGUAGE PLPGSQL
AS
$$
DECLARE
p_birth_date VARCHAR(8);
BEGIN
SELECT CASE WHEN substr(p_var,7,1) IN ('1','2','5','6') THEN '19'
WHEN substr(p_var,7,1) IN ('3','4','7','8') THEN '20'
ELSE '' END ||substr(p_var,1,6)
INTO p_birth_date;
EXCEPTION
WHEN others THEN null;
END;
$$;
Surely the best optimization strategy in every RDBMS is not to use the user defined function. But there are situations where you cannot remove a UDF. If you insist on the fundamental principles you may lose your job. So let's say that you cannot remove the above function. How will we improve the performance?
As I mentioned in the previous section, you may let the PostgreSQL optimizer know that the function is parallel-safe.
The following is the code I experimented with in order to check the performance.
drop table if exists t1;
create table t1 (c1 varchar(13), c2 char(200));
insert into t1
select '9004292042329', 'kkk'
from generate_series(1,1000000) a(i);
select f_get_birthdate(c1)
from t1;
QUERY PLAN
----------------------------------------------------------------------------------
Seq Scan on postgres_air.t1 (actual time=14.783..19269.096 rows=1000000 loops=1)
Output: f_get_birthdate(c1)
Buffers: shared hit=16260 read=15048
Planning:
Buffers: shared hit=18 read=2
Planning Time: 12.048 ms
Execution Time: 19552.021 ms
Now let's inform PostgreSQL that the function is parallel safe.