Last year while migrating from Oracle to PostgreSQL, I came across a UDF using an Oracle Autonomous Transaction.
Autonomous Transaction provides a functionality to the developer in
which it allows to do changes in a separate transaction and to save or
discard that particular transaction without affecting the main session
transaction. I know you may think what the hell I am talking about.
Please be patient. At the end of this article you will get the hang of
it.
Let's start off with the ORACLE function code we have to convert:
CREATE OR REPLACE FUNCTION F_SINGO_NEXT_VAL
(v_ymd
IN VARCHAR2)
RETURNVARCHAR2
IS
PRAGMA AUTONOMOUS_TRANSACTION;
v_last_singo_no VARCHAR2(17)
BEGIN
UPDATEsingo_no
SETlast_singo_no
= 'S' || v_ymd ||
LPAD(TO_CHAR(TO_NUMBER(
NVL(SUBSTR(last_singo_no,10,17),'0')
)
+1),8,'0')
;
--업데이트 데이터가 없으면,
최초 채번이므로 INSERT
수행
IF SQL%ROWCOUNT
= 0 THEN
INSERT INTO singo_no VALUES ('S'||v_ymd||'00000001');
END
IF;
--채번값 GET
SELECT last_singo_no
INTOv_last_singo_no
FROM singo_no;
COMMIT; --transaction commit
RETURNv_last_singo_no;
END;
Here is the DDL of the table SINGO_NO:
CREATE TABLE singo_no (LAST_SINGO_NO VARCHAR2(17) NOT NULL);
ALTER TABLE singo_no ADD CONSTRAINT singo_no_PK PRIMARY KEY(LAST_SINGO_NO);
The structure of the LAST_SINGO_NO value is as follows:
'S' + 'yyyymmdd' + 8 digit number
For example, the column can have a value, 'S2022082900002345'.
Before dealing with Autonomous Transaction I want to address the primary key format. When we design the structure of an artificial identifier (or surrogate key), there are three strategies we can use:
1. Use a sequence object offered by DBMS.
2. Use a max(primary key value) + 1 value.
3. Create an intermediate table to manage a maximum primary key value and use the value in the table as a primary key.
Using a sequence object is the best approach to obtaining a unique value.
But I frequently run across developers who want to have a primary key
value containing some business rules. Business rules include, for
example, when the row was inserted, where the row was created and then
some. The idea of a primary key value preserving a business rule puts
the system at risk. If any of the components of the key changes, then
all applications that access the key are affected. The moment the
business rule changes, you have to modify all the primary key values and
foreign key values in the child table as well. I had been confronted
with a primary key value consisting of a region code and a sequence
number and one day when the region code had to be divided into 3 region
codes, the admistrators of the system panicked.
The max(pk)+1 approach seriously compromises concurrency, which means that we have to programmatically force only one session to get the next primary key value at a time.
The
Oracle UDF under investigation is used to extract a unique value
revealing some business rules which say that the primary key value must
start with the character 'S' and 'yyyymmdd' followed by eight digit
numbers. Looking at the code closely, we can notice that the function is
defined as an autonomous transaction (PRAGMA AUTONOMOUS TRANSACTION).
In
subsequent paragraphs I will explain the mechanism of autonomous
transactions and I will give you a technique you might find useful in PostgreSQL.
Since
PostgreSQL does not explicitly support autonomous transactions, I
simply wrote following code ignoring the autonomous transaction
functionality.
create
table
singo_no (
last_singo_no
varchar(40) not null
);
alter table singo_no add primary key (last_singo_no);
I have set the storage parameter as follows:
alter
table singo_no set (autovacuum_vacuum_scale_factor= 0.0);
alter
table singo_no set (autovacuum_vacuum_threshold= 150);
alter
table singo_no set (autovacuum_vacuum_insert_threshold= 150);
I
intended to trigger a VACUUM whenever the table is updated 150 times.
If I don't change the default settings, the table can get bloated.
INSERT INTO singo_no VALUES('2021013100000001');
I
have inserted one row into the SINGO_NO table to make a sql fuction. If
there had been no rows in the table, I would have had to create a
plpgsql function.
Here is the PostgreSQL function code - converted from the Oracle UDF.
create or replace function singo_next_val(v_ymd
in varchar)
returnsvarchar
language sql volatile parallel
unsafe
as
$$
UPDATEsingo_no
SETlast_singo_no
=
'S'||v_ymd||LPAD((
COALESCE(SUBSTR(last_singo_no,10,17),'0')::numeric
+1)::text,8,'0')
RETURNINGlast_singo_no ;
$$
I have created a tiny table T1 and an anonymous code block to give an explanation of what an autonomous transaction is:
drop table t1;
CREATE TABLE T1 (C1 varchar(17), C2 varchar(100) not null);
ALTER TABLE T1
ADD CONSTRAINT T1_PK PRIMARY KEY (C1);
do
$main$
BEGIN
insert into t1 values(singo_next_val('20221009'),'dummy');
-- I have simulated some long-running query.
PERFORM pg_sleep(10);
END;
$main$
In session 1, I have executed an anonymous code block.
--session 1
do
$main$
BEGIN
insert into t1 values(singo_next_val('20221009'),'dummy');
-- I have simulated some long-running query.
PERFORM pg_sleep(10);
END;
$main$
As soon as I execute the anonymous block in session 1, I have opened a new session and run the following code in session 2:
--session 2
select singo_next_val('20221009');
We can observe that session 2 goes on waiting until session 1 completes.
This is because session 1 holds the exclusive lock on the record in the
SINGO_NO table and session 2 also tries to hold the lock on the record.
In Oracle by defining the singo_next_val
function as autonomous transaction, we would able to release the
lock inside the anonymous block as soon as the INSERT statement is run.
To implement this functionality we can use the PostgreSQL dblink.
The dblink actually opens a new connection and runs a query using this
new connection. Any SQL operation executed via a dblink is committed
immediately regardless of the main transaction in the anonymous block.
I have created the dblink extension.
create extension dblink;
The
modified function below updates a row via a dblink. This update will
get committed even if the calling transaction is rolled back:
create or replace function singo_next_val_at(v_ymd
in
varchar)
returnsvarchar
language sql volatile parallel safe
as
$body$
SELECT*
FROM
dblink('host=127.0.0.1 port=5444 user=scott dbname=analdb',
format($$UPDATEsingo_no
SETlast_singo_no
=
'S'||%L||LPAD((
COALESCE(SUBSTR(last_singo_no,10,17),'0')::numeric
+1)::text,8,'0')
RETURNINGlast_singo_no
$$,
v_ymd)) as t1(last_singo_no varchar(17))
$body$
I have opened a session and executed an anonymous code block.
--session 1
do
$main$
BEGIN
insert into t1 values(singo_next_val('20221009'),'dummy');
-- I have simulated some long-running query.
PERFORM pg_sleep(10);
END;
$main$
The function singo_next_val_at()
above calls UPDATE via a dblink and the operation is committed
immediately whether or not subsequent queries (PERFORM pg_sleep(10) in this block) fail.
As soon as I execute the anonymous block in session 1, I have opened a new session and run the following code in session 2:
--session 2
select singo_next_val_at('20210121');
Now
note that we can get the result in session 2 even though the anonymous
code block is still being executed in session 1. By implementing
Autonomous Transaction in a PostgreSQL function we increased concurrency. In some circumstances the performance benefit is tremendous.
The
PostgreSQL dblink is session-specific, which mans that each session
will have to open a new DB connection and this increases response time.
The dblink connection closes automatically at the end of the session.
Wrap Up
PostgreSQL doesn't have a direct equivalent for autonomous transaction but we can use dblink to get a similar functionality.
Footnote
When you migrate database from Oracle to PostgreSQL, you should be proactive in the project. The first thing you have to do is to communicate with
stakeholders in the project. If there are some functionalites that
PostgreSQL does not support,you have to convince stakeholders and
customize the business requirement not to use the functionality.
When
I first encountered the Oracle function under investigation, I tried to
convince application architects to change the business rule of making up
the primary key value saying the business rule in the Oracle database
was too complicated, risky, and inefficient. We would be able to improve
performance by using the sequence object. And there is not much benefit
for encoding more than one fact (a combination of 'yyyymmdd' and
identification numbers) in the primary key. If we change the format of
the primary key value, we can eliminate the UDF, which will result in less contention and less wait events. The performance benefit we could get from not generating an intelligent key (a key which contains some business rules) is huge.
Addendum
The
problem with the above function is that you are specifying the database
credential in the function. The workaround for this is to use dblink_fdw to create the remote server and user mapping as follows:
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE SERVER loopback_dblink FOREIGN DATA WRAPPER dblink_fdw
OPTIONS (hostaddr '127.0.0.1', dbname 'analdb');
CREATE USER MAPPING FOR scott SERVER loopback_dblink
OPTIONS (user 'scott', password 'tiger');
create or replace function singo_next_val_at2(v_ymd in varchar)
returns varchar
language sql volatile parallel safe
as
$body$
SELECT *
FROM
dblink('loopback_dblink',
format ($$ UPDATE singo_no
SET last_singo_no
= 'S'||%L||LPAD((
COALESCE(SUBSTR(last_singo_no,10,17),'0')::numeric
+1)::text,8,'0')
RETURNING last_singo_no
$$, v_ymd)) as t1(last_singo_no varchar(17))
$body$;
select singo_next_val_at2('20210121');