설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 126건, 최근 0 건
   

EDB's Oracle compatibility with regard to the outer join operater

글쓴이 : 모델광 날짜 : 2022-11-25 (금) 20:30 조회 : 221

A few years ago I was working on a project where I had to migrate an Oracle database to an open source database. Specifically the Oracle version was 12c. When I applied for the project, the target database was CUBRID. But at the project kickoff meeting the customer requested the project team to report why the project team chose CUBRID among many open source databases. So we had to consult open source database experts. We called in experts with backgrounds in Maria, PostgreSQL, ALTIBASE, and CUBRID. The consultants we called in gave us explanations of strong points of each open source database. Like many engineering decisions, choosing the right target database involved tradeoffs. Some databases were totally free, and some databases such as EDB and MySQL were not free. Coincidently the PostgreSQL consultant was one of the co-authors of the book "PostgreSQL 9.6 Performance Story" and he was working with EDB. Since I was a big fan of the book and I had worked with PostgreSQL before, I recommended the project manager to adopt EDB as the target databae.


Here are som advantages the EDB consultant presented to the project team  when migrating from Oracle to EDB:


- EDB has fewer rewrite problems compared with other databases.

- Developers can continue to use Oracle skills, which saves a lot of human resources.

- EDB supports a PL/SQL language.

- EDB supports Oracle specific features such as NVL, DECODE, SYSDATE, rownum, CONNECT BY and then some.


According to the EDB consultant, EDB supports the Oracle syntax, "(+)" for an outer join. The Oracle syntax uses the outer join operator, "(+)" in the join condition in the WHERE clause. The outer join operator is placed after the column name of the table for which you substitute null values for unmatched rows. 

The project manager was fascinated by the advantage that we would be able to use the existing queries and applicatioins and he decided to adopt EDB as the target database. However, it did not take long to realize that the EDB consultant was pulling the wool over our eyes. There were some outer join queries which did not work in EDB.


In this note I will supply an Oracle outer join query pattern that EDB does not support and how to work around it in EDB.

Here is the code to create the test tables, and insert some rows. I set up this demo using EDB 14.4.


create table customer (
cust_no        numeric not null,
cust_nm        character varying(100),
register_date  timestamp(0),
register_dt    varchar(8),
cust_status_cd varchar(1),
register_channel_cd varchar(1),
cust_age       numeric(3),
active_yn      varchar(1),
sigungu_cd     varchar(5),
sido_cd        varchar(2)
);

insert into customer
select i, chr(65+mod(i,26))||i::text||'CUST_NM'
     , current_date - mod(i,10000)
     , to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt
     , mod(i,5)+1 as cust_status_cd
     , mod(i,2)+1 as register_channel_cd
     , trunc(random() * 100) +1 as age
     , case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn
     , case when mod(i,1000) = 0 then '11007'
            when mod(i,1000) = 1 then '11006'
            when mod(i,1000) = 2 then '11005'
            when mod(i,1000) = 3 then '11004'
            when mod(i,1000) = 4 then '11003'
            when mod(i,1000) = 5 then '11002'
            else '11001' end                  as sigungu_cd
      , case when mod(i,4) = 0 then '01'
             when mod(i,4) = 1 then '02'
             else                   '03' end as sido_cd
  from generate_series(1,1000000) a(i);
ALTER TABLE customer ADD CONSTRAINT customer_pk
  PRIMARY KEY (cust_no);

create table channel (
channel_cd varchar(1) not null primary key,
channel_nm varchar(100) not null);


insert into channel values (1, 'ONLINE'),(2,'OFFLINE');


create table sido (
sido_cd varchar(2) not null primary key,
sido_nm varchar(100) not null);


insert into sido values ('01','Jeonbuk'),('02','Kangwon'),('03','Chungnam');


Here is the query in question. I am using the CHANNEL, SIDO, and CUSTOMER tables to demonstrate a point about the Oracle syntax for an outer join.


SELECT A.CHANNEL_CD
     , MAX(A.CHANNEL_NM) AS R_CHANNEL_NM
     , B.SIDO_CD
     , MAX(B.SIDO_NM) AS SIDO_NAME
     , COUNT(C.CUST_NO) AS CUST_NO_CNT
  FROM CHANNEL A, CUSTOMER C, SIDO B
 WHERE A.CHANNEL_CD = C.REGISTER_CHANNEL_CD(+)
    AND B.SIDO_CD       = C.SIDO_CD(+)

GROUP BY A.CHANNEL_CD, B.SIDO_CD
ORDER BY 
A.CHANNEL_CD, B.SIDO_CD;


Here is the output I get when I run the query under Oracle 19c:

(When I was working on a project migrating Oracle to EDB, the Oracle version was 12c. I am doing this test on 19c because I do not have Oracle 12c on my computer.)

+------------+--------------+---------+-----------+-------------+

| channel_cd | r_channel_nm | sido_cd | sido_name | cust_no_cnt |
+------------+--------------+---------+-----------+-------------+
| 1          | ONLINE       | 01      | Jeonbuk   | 250000      |
| 1          | ONLINE       | 02      | Kangwon   | 0           |
| 1          | ONLINE       | 03      | Chungnam  | 250000      |
| 2          | OFFLINE      | 01      | Jeonbuk   | 0           |
| 2          | OFFLINE      | 02      | Kangwon   | 250000      |
| 2          | OFFLINE      | 03      | Chungnam  | 250000      |
+------------+--------------+---------+-----------+-------------+


And here's the execution plan:

-----------------------------------------------------                                      

| Id  | Operation                      | Name      | Starts                            
-----------------------------------------------------                                             
|   0 | SELECT STATEMENT        |              |      1                 
|   1 |  SORT GROUP BY            |              |       1                             
|*  2 |   HASH JOIN OUTER         |             |      1                        
|   3 |    MERGE JOIN CARTESIAN|             |      1                                   
|   4 |     TABLE ACCESS FULL   | CHANNEL   |      1                               
|   5 |     BUFFER SORT          |             |      4                              
|   6 |      TABLE ACCESS FULL | SIDO        |      1                             
|*  7 |    TABLE ACCESS FULL   | CUSTOMER |       1   
-----------------------------------------------------  


When I run the same query in EDB, I get the following error message:


a table may be outer joined to at most one other table
LINE 8:    AND B.SIDO_CD    = C.SIDO_CD(+)


That was the moment when I realized that I got deceived by EDB's marketing words. The error message tells us that the CUSTOMER table can not be outer joined to the CHANNEL table and the SIDO table at the same time.


To work aound this EDB inability, we would use the standard SQL syntax for outer joins.


SELECT A.CHANNEL_CD
     , MAX(A.CHANNEL_NM) AS R_CHANNEL_NM
     , B.SIDO_CD
     , MAX(B.SIDO_NM) AS SIDO_NAME
     , COUNT(C.CUST_NO) AS CUST_NO_CNT
  FROM CHANNEL A CROSS JOIN SIDO B
  LEFT JOIN CUSTOMER C
    ON A.CHANNEL_CD = C.REGISTER_CHANNEL_CD
   AND B.SIDO_CD    = C.SIDO_CD
GROUP BY A.CHANNEL_CD, B.SIDO_CD
ORDER BY A.CHANNEL_CD, B.SIDO_CD;

If you are not familiar with ANSI SQL syntax, you can rewrite the query as follows:


SELECT D.CHANNEL_CD
      , MAX(CHANNEL_NM) AS R_CHANNEL_NM
     , D.SIDO_CD
     , MAX(SIDO_NM) AS SIDO_NAME
     , COUNT(C.CUST_NO) AS CUST_NO_CNT
  FROM (SELECT A.CHANNEL_CD, A.CHANNEL_NM, B.SIDO_CD, B.SIDO_NM

            FROM CHANNEL A, SIDO B

         ) D, CUSTOMER C

WHERE D.CHANNEL_CD = C.REGISTER_CHANNEL_CD(+)

   AND D.SIDO_CD      = C.SIDO_CD(+)
GROUP BY D.CHANNEL_CD, D.SIDO_CD
ORDER BY D.CHANNEL_CD, D.SIDO_CD;


I have created an inline view called D, and then outer joined the CUSTOMER table to the inline view. The view is itself a join of two tables. We can infer that it is not possible to callapes the inline view in this query because it would result in the table CUSTOMER being outer joined to two tables, which is not legal in EDB outer join.

Here's the execution plan obtained using the EXPLAIN(ANALYZE, BUFFERS, COSTS OFF) command.


Sort (actual time=571.177..571.180 rows=6 loops=1)
  Sort Key: a.channel_cd, b.sido_cd
  Sort Method: quicksort  Memory: 25kB
  Buffers: shared hit=11366
  ->  HashAggregate (actual time=571.164..571.167 rows=6 loops=1)
        Group Key: a.channel_cd, b.sido_cd
        Batches: 1  Memory Usage: 24kB
        Buffers: shared hit=11366
        ->  Hash Right Join (actual time=0.038..316.776 rows=1000002 loops=1)
              Hash Cond: (((c.register_channel_cd)::text = (a.channel_cd)::text) AND ((c.sido_cd)::text = (b.sido_cd)::text))
              Buffers: shared hit=11366
              ->  Seq Scan on customer c (actual time=0.006..49.930 rows=1000000 loops=1)
                    Buffers: shared hit=11364
              ->  Hash (actual time=0.024..0.026 rows=6 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    Buffers: shared hit=2
                    ->  Nested Loop (actual time=0.018..0.021 rows=6 loops=1)
                          Buffers: shared hit=2
                          ->  Seq Scan on sido b (actual time=0.007..0.007 rows=3 loops=1)
                                Buffers: shared hit=1
                          ->  Materialize (actual time=0.003..0.003 rows=2 loops=3)
                                Buffers: shared hit=1
                                ->  Seq Scan on channel a (actual time=0.005..0.005 rows=2 loops=1)
                                      Buffers: shared hit=1
Planning:
  Buffers: shared hit=12
Planning Time: 0.328 ms
Execution Time: 571.231 ms


Note that EDB executed the query by joining SIDO and CHANNEL to produce the rows of the inline view, and then joining that result to CUSTOMER. The inline view alias D was gone in the execution plan. Presumably EDB transformed the Oracle syntax query into the standard ANSI query before the query planner produced the "best" plan.


Conclusion

EDB is compatible with Oracle's outer join syntax using the plus (+) sign. However, a table can not be outer joined to two tables using the plus (+) sign in EDB. We can hack our way around this incompatibility by using ANSI SQL syntax or an inline view.


Footnote

If you take a close look at the SQL statements with the ANSI SQL syntax or an inline view, the CUSTOMER table is being outer joined to two tables (SIDO and CHANNEL). So the error message "a table may be outer joined to at most one other table" EDB prints out is inappropriate. I think the error message should go like this:

a table may be outer joined to at most one other table when you use the (+) sign


   

postgresdba.com