설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 168건, 최근 1 건
   

Partitioned Outer Join

글쓴이 : 모델광 날짜 : 2024-05-26 (일) 16:45 조회 : 487
A few years ago while working on a project that required converting SQL statements from Oracle 12c to PostgreSQL 11, I encountered an SQL statement that puzzled me.
By sharing the experience and the solutioins I have discovered, I hope to help others facing similar challenges when transitioning from Oracle to PostgreSQL. Even seasoned professionals can always learn something new.

Below is the SQL statement in Oracle followed by its retured data:

SELECT X.SIGNGU_NAME, Y.REGIST_DT, COUNT(Y.SIGNGU_CODE)
  FROM SIGNGU_CODE X
LEFT JOIN SINGO Y PARTITION BY (Y.REGIST_DT)
    ON X.SIGNGU_CODE = Y.SIGNGU_CODE
GROUP BY X.SIGNGU_NAME, Y.REGIST_DT
ORDER BY Y.REGIST_DT, X.SIGNGU_NAME;

signgu_name|regist_dt              |cnt|
JOENBUK    |2024-05-22 00:00:00.000|  4|
JOENNAM    |2024-05-22 00:00:00.000|  0|
KYEONGGI   |2024-05-22 00:00:00.000|  0|
SEOUL        |2024-05-22 00:00:00.000|  0|
JOENBUK    |2024-05-23 00:00:00.000|  3|
JOENNAM    |2024-05-23 00:00:00.000|  0|
KYEONGGI   |2024-05-23 00:00:00.000|  1|
SEOUL        |2024-05-23 00:00:00.000|  1|
JOENBUK    |2024-05-24 00:00:00.000|  2|
JOENNAM    |2024-05-24 00:00:00.000|  1|
KYEONGGI   |2024-05-24 00:00:00.000|  1|
SEOUL        |2024-05-24 00:00:00.000|  1|
JOENBUK    |2024-05-25 00:00:00.000|  1|
JOENNAM    |2024-05-25 00:00:00.000|  1|
KYEONGGI   |2024-05-25 00:00:00.000|  1|
SEOUL        |2024-05-25 00:00:00.000|  1|
JOENBUK    |2024-05-26 00:00:00.000|  2|
JOENNAM    |2024-05-26 00:00:00.000|  1|
KYEONGGI   |2024-05-26 00:00:00.000|  1|
SEOUL        |2024-05-26 00:00:00.000|  0|
JOENBUK    |2024-05-27 00:00:00.000|  3|
JOENNAM    |2024-05-27 00:00:00.000|  1|
KYEONGGI   |2024-05-27 00:00:00.000|  0|
SEOUL        |2024-05-27 00:00:00.000|  0|
JOENBUK    |2024-05-28 00:00:00.000|  4|
JOENNAM    |2024-05-28 00:00:00.000|  0|
KYEONGGI   |2024-05-28 00:00:00.000|  0|
SEOUL        |2024-05-28 00:00:00.000|  0|

You will notice that the 'PARTITION BY' clause apears right after the joined table name. My knowledgw was that the 'PARTITION BY' clause should be used with analytic functions. Seeing this query, I initially thought that it was syntactically incorrect and wondered how it could exist in a production system without generating an error. But after some research, I discovered that the Partitioned Outer Join is a long-standing feature in Oracle.

What is a Partitioned Outer Join?
A partitioned outer join in Oracle can be used to fill gaps in sparse data. In the query above, even if no row exists for a given combination of (SIGNGU_NAME, REGIST_DT), the combination still appears in the results. If you omit the 'PARTITION BY' clause, these combinations will not be displayed when there are no matching rows.

The Challenges with PostgreSQL
Unfortunately, PostgreSQL does not support the partitioned outer join syntax found in Oracle. However, we can implement similar functionality using different techniques in PostgreSQL. Below, I'll walk through some sample code to illustrate how to implement a partitioned outer join-like functionality in PostgreSQL.

First, let's set up the tables:

CREATE TABLE signgu_code (
    signgu_code VARCHAR(2),
    signgu_name VARCHAR(100)
);

INSERT INTO signgu_code VALUES('02', 'SEOUL');
INSERT INTO signgu_code VALUES('03', 'KYEONGGI');
INSERT INTO signgu_code VALUES('04', 'JOENNAM');
INSERT INTO signgu_code VALUES('05', 'JOENBUK');

DROP TABLE IF EXISTS singo;

CREATE TABLE singo (
    sn INT,
    regist_dt TIMESTAMP(0),
    signgu_code VARCHAR(2)
);

INSERT INTO singo
SELECT i, CURRENT_DATE + (MOD(i, 7)),
       CASE MOD(i, 8)
            WHEN 0 THEN '02'
            WHEN 1 THEN '03'
            WHEN 2 THEN '04'
            ELSE '05'
       END
FROM generate_series(1, 30) a(i);


Attempting the Conversion
If you run the query using the Partitioned Outer Join syntax in PostgreSQL, you get the following error:

SQL Error [42601]: ERROR: syntax error at or near "PARTITION"

You might convert the Oracle query as follows:

SELECT a.signgu_name, b.regist_dt, b.cnt
  FROM signgu_code a
LEFT JOIN (
      SELECT regist_dt, signgu_code, COUNT(*) AS cnt
        FROM singo
       GROUP BY signgu_code, regist_dt
       ORDER BY regist_dt, signgu_code
       ) b
    ON a.signgu_code = b.signgu_code
ORDER BY b.regist_dt, a.signgu_name;

Output:

signgu_name|regist_dt              |cnt|
-----------+-----------------------+---+
JOENBUK    |2024-05-22 00:00:00.000|  4|
JOENBUK    |2024-05-23 00:00:00.000|  3|
KYEONGGI   |2024-05-23 00:00:00.000|  1|
SEOUL      |2024-05-23 00:00:00.000|  1|
JOENBUK    |2024-05-24 00:00:00.000|  2|
JOENNAM    |2024-05-24 00:00:00.000|  1|
KYEONGGI   |2024-05-24 00:00:00.000|  1|
SEOUL      |2024-05-24 00:00:00.000|  1|
JOENBUK    |2024-05-25 00:00:00.000|  1|
JOENNAM    |2024-05-25 00:00:00.000|  1|
KYEONGGI   |2024-05-25 00:00:00.000|  1|
SEOUL      |2024-05-25 00:00:00.000|  1|
JOENBUK    |2024-05-26 00:00:00.000|  2|
JOENNAM    |2024-05-26 00:00:00.000|  1|
KYEONGGI   |2024-05-26 00:00:00.000|  1|
JOENBUK    |2024-05-27 00:00:00.000|  3|
JOENNAM    |2024-05-27 00:00:00.000|  1|
JOENBUK    |2024-05-28 00:00:00.000|  4|

Note that the retrieved data is different from Oracle 12c's. Many rows were omitted.

Filling Gaps with Cartesian Product
To get 'SINGO' data for all 'SIGNGU_CODE' on every 'REGIST_DT', we use a Cartesian product provided by a cross join.

SELECT B.SIGNGU_NAME, A.REGIST_DT, COUNT(C.SIGNGU_CODE) AS cnt
  FROM (SELECT DISTINCT regist_dt FROM singo) A
CROSS JOIN signgu_code B
LEFT JOIN singo C
    ON A.regist_dt = C.regist_dt
   AND B.signgu_code = C.signgu_code
GROUP BY A.regist_dt, B.signgu_name
ORDER BY A.regist_dt, B.signgu_name;

Output:

signgu_name|regist_dt              |cnt|
JOENBUK    |2024-05-22 00:00:00.000|  4|
JOENNAM    |2024-05-22 00:00:00.000|  0|
KYEONGGI   |2024-05-22 00:00:00.000|  0|
SEOUL        |2024-05-22 00:00:00.000|  0|
JOENBUK    |2024-05-23 00:00:00.000|  3|
JOENNAM    |2024-05-23 00:00:00.000|  0|
KYEONGGI   |2024-05-23 00:00:00.000|  1|
SEOUL        |2024-05-23 00:00:00.000|  1|
JOENBUK    |2024-05-24 00:00:00.000|  2|
JOENNAM    |2024-05-24 00:00:00.000|  1|
KYEONGGI   |2024-05-24 00:00:00.000|  1|
SEOUL        |2024-05-24 00:00:00.000|  1|
JOENBUK    |2024-05-25 00:00:00.000|  1|
JOENNAM    |2024-05-25 00:00:00.000|  1|
KYEONGGI   |2024-05-25 00:00:00.000|  1|
SEOUL        |2024-05-25 00:00:00.000|  1|
JOENBUK    |2024-05-26 00:00:00.000|  2|
JOENNAM    |2024-05-26 00:00:00.000|  1|
KYEONGGI   |2024-05-26 00:00:00.000|  1|
SEOUL        |2024-05-26 00:00:00.000|  0|
JOENBUK    |2024-05-27 00:00:00.000|  3|
JOENNAM    |2024-05-27 00:00:00.000|  1|
KYEONGGI   |2024-05-27 00:00:00.000|  0|
SEOUL        |2024-05-27 00:00:00.000|  0|
JOENBUK    |2024-05-28 00:00:00.000|  4|
JOENNAM    |2024-05-28 00:00:00.000|  0|
KYEONGGI   |2024-05-28 00:00:00.000|  0|
SEOUL        |2024-05-28 00:00:00.000|  0|

Note that the result is identical to Oracle 12c's.

Performance Considerations
As you can see, we need to access the 'SINGO' table twice, which can result in a performance drop. Unfortunately, I haven't come up with a more efficient solution.

FOOTNOTE
- If there is an index on the column 'REGIST_DT', you can quickly retrieve distinct values using the technique of "retrieving distinct values from an indexed column", which I have frequently discussed in prevous notes.
- Before performing the left join on the 'SINGO' table, you might want to aggregate the data to improve performance.


   

postgresdba.com