A few years ago I was working on a
project where I had to migrate an Oracle database to an open source
database. Specifically the Oracle version was 12c. When I applied for
the project, the target database was CUBRID. But at
the project kickoff meeting the customer requested the project team to
report why the project team chose CUBRID among many open source
databases. So we had to consult open source database experts. We called
in experts with backgrounds in Maria, PostgreSQL, ALTIBASE,
and CUBRID. The consultants we called in gave us explanations of strong
points of each open source database. Like many engineering decisions,
choosing the right target database involved tradeoffs. Some databases
were totally free, and some databases such as EDB and MySQL were not
free. Coincidently the PostgreSQL
consultant was one of the co-authors of the book "PostgreSQL 9.6
Performance Story" and he was working with EDB. Since
I was a big fan of the book and I had worked with PostgreSQL before, I
recommended the project manager to adopt EDB as the target databae.
Here are som advantages the EDB consultant presented to the project team when migrating from Oracle to EDB:
- EDB has fewer rewrite problems compared with other databases.
- Developers can continue to use Oracle skills, which saves a lot of human resources.
- EDB supports a PL/SQL language.
- EDB supports Oracle specific features such as NVL, DECODE, SYSDATE, rownum, CONNECT BY and then some.
According to the EDB consultant, EDB supports the
Oracle syntax, "(+)" for an outer join.
The Oracle syntax uses the outer join operator, "(+)" in the join
condition in the WHERE clause. The outer join operator is placed after
the column name of the table for which you substitute
null values for unmatched rows.
The project manager was fascinated by the advantage that we would be able to use the existing queries and applicatioins and he decided to adopt EDB as the target database. However, it did not take long to realize that the EDB consultant was
pulling the wool over our eyes. There were some outer join queries which did not work in EDB.
In this note I will supply an Oracle outer join query pattern that EDB does not support and how to work around it in EDB.
Here is the code to create the test tables, and insert some rows. I set up this demo using EDB 14.4.
create table customer (
cust_no numeric not null,
cust_nm character varying(100),
register_date timestamp(0),
register_dt varchar(8),
cust_status_cd varchar(1),
register_channel_cd varchar(1),
cust_age numeric(3),
active_yn varchar(1),
sigungu_cd varchar(5),
sido_cd varchar(2)
);
insert into customer
select i, chr(65+mod(i,26))||i::text||'CUST_NM'
, current_date - mod(i,10000)
, to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt
, mod(i,5)+1 as cust_status_cd
, mod(i,2)+1 as register_channel_cd
, trunc(random() * 100) +1 as age
, case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn
, case when mod(i,1000) = 0 then '11007'
when mod(i,1000) = 1 then '11006'
when mod(i,1000) = 2 then '11005'
when mod(i,1000) = 3 then '11004'
when mod(i,1000) = 4 then '11003'
when mod(i,1000) = 5 then '11002'
else '11001' end as sigungu_cd
, case when mod(i,4) = 0 then '01'
when mod(i,4) = 1 then '02'
else '03' end as sido_cd
from generate_series(1,1000000) a(i);
ALTER TABLE customer ADD CONSTRAINT customer_pk
PRIMARY KEY (cust_no);
create table channel (
channel_cd varchar(1) not null primary key,
channel_nm varchar(100) not null);
insert into channel values (1, 'ONLINE'),(2,'OFFLINE');
create table sido (
sido_cd varchar(2) not null primary key,
sido_nm varchar(100) not null);
insert into sido values ('01','Jeonbuk'),('02','Kangwon'),('03','Chungnam');
Here is the query in question. I am
using the CHANNEL, SIDO, and CUSTOMER tables to demonstrate a point
about the Oracle syntax for an outer join.
SELECT A.CHANNEL_CD
, MAX(A.CHANNEL_NM) AS R_CHANNEL_NM
, B.SIDO_CD
, MAX(B.SIDO_NM) AS SIDO_NAME
, COUNT(C.CUST_NO) AS CUST_NO_CNT
FROM CHANNEL A, CUSTOMER C, SIDO B
WHERE A.CHANNEL_CD = C.REGISTER_CHANNEL_CD(+)
AND B.SIDO_CD = C.SIDO_CD(+)
GROUP BY A.CHANNEL_CD, B.SIDO_CD
ORDER BY A.CHANNEL_CD, B.SIDO_CD;
Here is the output I get when I run the query under Oracle 19c:
(When I was working on a project migrating Oracle to EDB, the Oracle version was 12c. I
am doing this test on 19c because I do not have Oracle 12c on my computer.)
+------------+--------------+---------+-----------+-------------+
| channel_cd | r_channel_nm | sido_cd | sido_name | cust_no_cnt |
+------------+--------------+---------+-----------+-------------+
| 1 | ONLINE | 01 | Jeonbuk | 250000 |
| 1 | ONLINE | 02 | Kangwon | 0 |
| 1 | ONLINE | 03 | Chungnam | 250000 |
| 2 | OFFLINE | 01 | Jeonbuk | 0 |
| 2 | OFFLINE | 02 | Kangwon | 250000 |
| 2 | OFFLINE | 03 | Chungnam | 250000 |
+------------+--------------+---------+-----------+-------------+
And here's the execution plan:
-----------------------------------------------------
| Id | Operation | Name | Starts
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 1
| 1 | SORT GROUP BY | | 1
|* 2 | HASH JOIN OUTER | | 1
| 3 | MERGE JOIN CARTESIAN| | 1
| 4 | TABLE ACCESS FULL | CHANNEL | 1
| 5 | BUFFER SORT | | 4
| 6 | TABLE ACCESS FULL | SIDO | 1
|* 7 | TABLE ACCESS FULL | CUSTOMER | 1
-----------------------------------------------------
When I run the same query in EDB, I get the following error message:
a table may be outer joined to at most one other table
LINE 8: AND B.SIDO_CD = C.SIDO_CD(+)
That was the moment when I realized that I got deceived by EDB's marketing words.
The error message tells us that the CUSTOMER table can not be outer
joined to the CHANNEL table and the SIDO table at the same time.
To work aound this EDB inability, we would use the standard SQL syntax for outer joins.
SELECT A.CHANNEL_CD
, MAX(A.CHANNEL_NM) AS R_CHANNEL_NM
, B.SIDO_CD
, MAX(B.SIDO_NM) AS SIDO_NAME
, COUNT(C.CUST_NO) AS CUST_NO_CNT
FROM CHANNEL A CROSS JOIN SIDO B
LEFT JOIN CUSTOMER C
ON A.CHANNEL_CD = C.REGISTER_CHANNEL_CD
AND B.SIDO_CD = C.SIDO_CD
GROUP BY A.CHANNEL_CD, B.SIDO_CD
ORDER BY A.CHANNEL_CD, B.SIDO_CD;
If you are not familiar with ANSI SQL syntax, you can rewrite the query as follows:
SELECT D.CHANNEL_CD
, MAX(CHANNEL_NM) AS R_CHANNEL_NM
, D.SIDO_CD
, MAX(SIDO_NM) AS SIDO_NAME
, COUNT(C.CUST_NO) AS CUST_NO_CNT
FROM (SELECT A.CHANNEL_CD, A.CHANNEL_NM, B.SIDO_CD, B.SIDO_NM
FROM CHANNEL A, SIDO B
) D, CUSTOMER C
WHERE D.CHANNEL_CD = C.REGISTER_CHANNEL_CD(+)
AND D.SIDO_CD = C.SIDO_CD(+)
GROUP BY D.CHANNEL_CD, D.SIDO_CD
ORDER BY D.CHANNEL_CD, D.SIDO_CD;
I have created an inline view called D, and then outer joined the
CUSTOMER table to the inline view. The view is itself a join of two
tables. We can infer that it is not possible to callapes the inline view in this query because it would result in the table
CUSTOMER being outer joined to two tables, which is not legal in EDB outer join.
Here's the execution plan obtained using the EXPLAIN(ANALYZE, BUFFERS, COSTS OFF) command.
Sort (actual time=571.177..571.180 rows=6 loops=1)
Sort Key: a.channel_cd, b.sido_cd
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=11366
-> HashAggregate (actual time=571.164..571.167 rows=6 loops=1)
Group Key: a.channel_cd, b.sido_cd
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=11366
-> Hash Right Join (actual time=0.038..316.776 rows=1000002 loops=1)
Hash Cond: (((c.register_channel_cd)::text = (a.channel_cd)::text) AND ((c.sido_cd)::text = (b.sido_cd)::text))
Buffers: shared hit=11366
-> Seq Scan on customer c (actual time=0.006..49.930 rows=1000000 loops=1)
Buffers: shared hit=11364
-> Hash (actual time=0.024..0.026 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=2
-> Nested Loop (actual time=0.018..0.021 rows=6 loops=1)
Buffers: shared hit=2
-> Seq Scan on sido b (actual time=0.007..0.007 rows=3 loops=1)
Buffers: shared hit=1
-> Materialize (actual time=0.003..0.003 rows=2 loops=3)
Buffers: shared hit=1
-> Seq Scan on channel a (actual time=0.005..0.005 rows=2 loops=1)
Buffers: shared hit=1
Planning:
Buffers: shared hit=12
Planning Time: 0.328 ms
Execution Time: 571.231 ms
Note that EDB executed the query by joining SIDO and CHANNEL to produce
the rows of the inline view, and then joining that result to CUSTOMER.
The inline view alias D was gone in the execution plan. Presumably EDB transformed the Oracle syntax query into the standard ANSI query before the query planner produced the "best" plan.
Conclusion
EDB is compatible with Oracle's outer join syntax using the plus (+) sign. However, a table can not be outer joined to two tables using the plus (+) sign in EDB. We can hack our way around this incompatibility by using ANSI SQL syntax or an inline view.
Footnote
If you take a close look at the SQL statements with the ANSI SQL syntax
or an inline view, the CUSTOMER table is being outer joined to two
tables (SIDO and CHANNEL). So the error message "a table may be outer joined to at
most one other table" EDB prints out is inappropriate. I think the error message should go like this:
a table may be outer joined to at most one other table when you use the (+) sign