PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 175건, 최근 0 건

Removing exception blocks

글쓴이 : 모델광 날짜 : 2022-12-09 (금) 22:29 조회 : 1912

In the last article titled 'UDF Issue 1', I have shown you why it is important to reduce the number of function calls. In this note I wiill show you how much impact an exception block in PL/pgSQL has on query performance and I will also give you an idea of how to eliminate exception blocks.

I had drafted this note when I had worked on a project where I had to port Oracle functions to PostgreSQL. I was prompted to complete this note recently when I noticed a PL/pgSQL function which, I think, contained unnecessary exception blocks.

Let's assume that we have to port the following Oracle function:

CREATE OR REPLACE FUNCTION F_SIDO_NM (

p_empno IN EMPLOYEE.EMPNO%TYPE )

RETURN VARCHAR2

v_sido_nm EMPLOYEE.SIDO_NM%TYPE;

BEGIN

SELECT SIDO_NM

INTO v_sido_nm

FROM EMPLOYEE

WHERE EMPNO = p_empno;

RETURN v_sido_nm;

EXCEPTION

WHEN NO_DATA_FOUND THEN

v_sido_nm := 'Not Assigned';

RETURN v_sido_nm;

END;

Here is the code to create the table EMPLOYEE.

CREATE TABLE employee (

empno numeric(5,0) NOT NULL,

ename character varying(10),

job character varying(9),

mgr numeric(5,0),

hiredate timestamp(0),

sal numeric(7,2),

comm numeric(7,2),

deptno numeric(2,0),

sido_nm character varying(100)

);

insert into employee

select i, chr(65+mod(i,26))||i::text||'NM'

,case when mod(i,10000)=0 then 'PRESIDENT'

when mod(i,1000) = 0 then 'MANAGER'

when mod(i,3)=0 then 'SALESMAN'

when mod(i,3)=1 then 'ANALYST'

when mod(i,3)=2 then 'CLERK'

end as job

,case when mod(i,10000)= 0 then null when mod(i,1000)= 1 then 10000 when i >= 9000 then 1000

else ceiling((i+1000)/1000)*1000 end as mgr

, current_date - i , trunc(random() * 10000) as sal

, trunc(random() * 10000) as com

, mod(i,12)+1 as deptno

, case when mod(i,3) = 0 then 'Jeonbuk' when mod(i,3) = 1 then 'Kangwon'

else 'Chungnam' end as sido_nm

from generate_series(1,10000) a(i);

ALTER TABLE employee ADD CONSTRAINT employee_pk PRIMARY KEY (empno);

The Oracle function in question accepts an employee number as an input variable and returns the sido name. When no rows are retrieved, that is to say, there are no employees, we get the message 'Not Assigned'.

Let's go through the Oracle function and see the differences compared to PL/pgSQL.

- The Return clause in the function header (not the function body) becomes RETURNS in PL/pgSQL.

- The datatype of the returned value, VARCHAR2, which is after the Return keyword has to be changed to VARCHAR. In this note, we'll use VARCHAR, but TEXT is often a better choice if you do not need specific string length limits.

- the IS keword becomes AS, and we need to add a LANGUAGE clause because PL/pgSQL is not the only possible function language.

- An assignment of a value to a variable is done by the colon sign combined with an equality sign(:=).

- In PostgreSQL, the function body is considered to be a string literal, so you need to use quote marks(") or dollar quotes($$) around it. This substitutes for the terminating slash(/) in the Oracle function.

Here is how this function would look when ported to PostgreSQL:

CREATE OR REPLACE FUNCTION F_SIDO_NM_PL(

p_empno IN NUMERIC)

RETURNS VARCHAR

LANGUAGE PLPGSQL STABLE PARALLEL UNSAFE

DECLARE

v_sido_nm EMPLOYEE.SIDO_NM%TYPE;

BEGIN

SELECT SIDO_NM

INTO STRICT v_sido_nm

FROM EMPLOYEE

WHERE EMPNO = p_empno;

RETURN v_sido_nm;

EXCEPTION

WHEN NO_DATA_FOUND THEN

v_sido_nm := 'Not Assigned';

RETURN v_sido_nm;

END;

There are a few points to note here:

Firstly, the STABLE keyword was specified instead of VOLATILE(default value). When you are certain that the function does not change any data in the database you'better declare the function STABLE. Secondly, the function was marked PARALLEL UNSAFE because there is an exception block in the function body. Thirdly, the STRICT option in the INTO clause was specified to check whether the query returns exactly one row. If the query does not retrieve one orw, a run-time error will be reported, either NO_DATA_FOUND(no rows) or TOO_MANY_ROWS(more than one row). Finally there still exists an exception block to catch the NO_DATA_FOUND error.

You may think that the code conversion was successful and we did a good job in porting the Oracle function to PostgreSQL. But there are always peripheral considerations that might cause problems.

- The ported function was declared paralled unsafe, which prevents queries with this function from being processed in parallel. (This may not be a serious problem because the Oracle function itself was not enabled for parallel executions.)

- There exists an exception block which degrades performance.

Regarding this specific Oracle function, if the SIDO_NM column does not allow null values, we can use an sql function when porting to PostgreSQL. If the SIDO_NM column has a null value, we can't port the Oracle function to a PostgreSQL sql function because we can't have the sql function to return the same result.

In the project I had worked on, there were some lazy data modeling. In an ERD there was a column marked nullable but when I checked the vaules on the column I couldn't find any null values. When this happens, all developers think that the column is nullable, and they start using unneccesary COALESCE on the column and then some.

Here is how the sql function will look if the SISO_NM column has a NOT NULL constraint:

CREATE OR REPLACE FUNCTION F_SIDO_NM(

p_empno IN NUMERIC)

RETURNS VARCHAR

LANGUAGE SQL STABLE PARALLEL SAFE

SELECT COALESCE(max(sido_nm),'Not Assigned')

FROM EMPLOYEE

WHERE EMPNO = p_empno;

If there are no null values in the SIDO_NM column, the above function can replace the PL/pgSQL function. And sql functions are more performant than PL/pgSQL functions.

Regarding exception blocks in PostgreSQL, the exception block of PostgreSQL is not identical with that of Oracle. When an error occurs in a block, PostgreSQL will abort the execution of the block and also the surrounding transaction. To recover from the error, the EXCEPTION clause is used. However, the exception block of PostgreSQL incurs subtransactions. In Oracle, the exception block is just another branch of the begin .... end block.

How can we remove an exception block in the above PL/pgSQL function?

Instead of using STRICT in the INTO clause and an exception block, we can use the special FOUND variable to determine whether a row was returned. The FOUND variable is set to TRUE if more than one row is returned.

Here is the rewritten code of the PL/pgSQL function.

Note that we could safely declare the function PARALLEL SAFE because there is no exeption blocks in the rewritten code.

CREATE OR REPLACE FUNCTION F_SIDO_NM_NE(

p_empno IN NUMERIC)

RETURNS VARCHAR

LANGUAGE PLPGSQL STABLE PARALLEL SAFE

DECLARE

v_sido_nm EMPLOYEE.SIDO_NM%TYPE;

BEGIN

SELECT SIDO_NM

INTO v_sido_nm

FROM EMPLOYEE

WHERE EMPNO = p_empno;

IF NOT FOUND THEN

RETURN ‘Not Assigned’;

END IF;

RETURN v_sido_nm;

END;

This technique of using the special FOUND variable to remove an exception block is not always possible. If the Oracle function had an exception handler code to catch TOO_MANY_ROWS in the exception block, we had to use the STRICT option and an exception block in the PL/pgSQL function.

Now let's check the performance of PostgreSQL functions we have created so far.

I have run an PL/pgSQL function with an exception block.

SELECT F_SIDO_NM_PL(EMPNO)

FROM EMPLOYEE;

Here is the output of the EXPLAIN command.

Seq Scan on employee (actual time=0.055..56.311 rows=10000 loops=1)

Buffers: shared hit=30110

Planning Time: 0.036 ms

Execution Time: 56.951 ms

I have run an PL/pgSQL function without an exception block.

SELECT F_SIDO_NM_NE(EMPNO)

FROM EMPLOYEE;

Seq Scan on employee (actual time=0.040..44.250 rows=10000 loops=1)

Buffers: shared hit=30110

Planning Time: 0.036 ms

Execution Time: 44.752 ms

Note that even though the numbers of block IO are identical, the elapsed time dropped when we called a PL/pgSQL function without an exception block.

The EMPLOYEE table is small in size, so PostgreSQL did not apply parallel processing. If the EMPLOYEE table had been big enough for prallel processing to kick in, the elapsed time gap would have been bigger.

I have run an sqlL function.

SELECT F_SIDO_NM(EMPNO)

FROM EMPLOYEE;

Seq Scan on employee (actual time=0.108..59.657 rows=10000 loops=1)

Buffers: shared hit=30110

Planning Time: 0.098 ms

Execution Time: 60.181 ms

Unlike what I had expected, the sql function did not outperform PL/pgSQL functions. Repeated execution of COALESCE and MAX is the culprit of poor performance.

This article didn't go into why an exception block is a cause of performance degradation and why you must not label a funcion as PARALLEL SAFE if there in an exeption block in PL/pgSQL. I will explain it in detail in the next article.

Conclusion

When you run into an exception block in a PL/pgSQL function, try to find a way to remove it. Sometimes you can increase the chances of removing an exception block when you look into the business requirements carefully.

Added on the next day of this publication

In the middle of this article I stated:

"However, the exception block of PostgreSQL incurs subtransactions."

The above statement is only true when there is an INSERT or UPDATE or DELETE operation in the main block. Since there are no DML queries in this sample, the above statement is not correct.

E-MAIL: root@postgresdba.com