설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 162건, 최근 0 건
   

Left Join vs. Scalar Subquery

글쓴이 : 모델광 날짜 : 2022-07-16 (토) 03:17 조회 : 1249

A recent post on a naver cafe prompted me to write this article:

https://cafe.naver.com/dbian/5442 

The poster gave a quiz about how we can reduce the number of block IOs of a specific query. And I wondered how query performance will be in PostgreSQL. So I have copied the test scripts (with minor cosmetic changes) and done the test in PostgreSQL.

We start with two tables:


CREATE TABLE 주문
(
  주문ID        VARCHAR(40) NOT NULL
  ,주문일시     DATE           NULL
  ,주문요청자ID VARCHAR(40)  NULL
);
ALTER TABLE 주문 ADD CONSTRAINT PK_주문 PRIMARY KEY(주문ID) ;
TRUNCATE TABLE 주문;

CREATE TABLE 주문처리내역
(
   주문ID VARCHAR(40) NOT NULL
   ,주문순번 NUMERIC(9) NOT NULL
   ,주문처리일시 DATE NULL
   ,주문처리자ID VARCHAR(40) NULL
);
ALTER TABLE 주문처리내역 ADD CONSTRAINT PK_주문처리내역 PRIMARY KEY(주문ID,주문순번);
TRUNCATE TABLE 주문처리내역;


주문 is a parent table and 주문처리내역 is a child table.

Let's load some data into these tables.


INSERT INTO 주문(주문ID,주문일시,주문요청자ID)
WITH R_DT AS (
      SELECT generate_series('20220101'::timestamp, '20220131'::timestamp, interval '1 day') AS DT
      ),
  R_ORD AS (
      SELECT i ONO FROM generate_series(1,100) a(i)   
      ),
  M_CUS AS (
      SELECT 'CUS_'||i::varchar AS CUS_ID FROM generate_series(1, 100) a(i)
           )
SELECT  TO_CHAR(T1.DT,'YYYYMMDD')||'_'||(ROW_NUMBER() OVER(PARTITION BY T1.DT ORDER BY T0.ONO ASC,T2.CUS_ID ASC),5,'0')::varchar 주문ID
        ,T1.DT 주문일시
        ,T2.CUS_ID 주문요청자ID
FROM    R_ORD T0
        ,R_DT T1
        ,M_CUS T2;
SELECT pg_relation_size('주문');

INSERT INTO 주문처리내역(주문ID,주문순번,주문처리일시,주문처리자ID)
SELECT  T1.주문ID
        ,T2.RNO 주문순번
        ,T1.주문일시 주문처리일시
        ,T1.주문요청자ID 주문처리자ID
FROM    주문 T1
        ,(SELECT i RNO FROM generate_series(1,30) a(i)) T2;

SELECT pg_relation_size('주문처리내역');


I have updated some values of a column in the 주문 table.


--3100 rows updated
UPDATE  주문
SET     주문요청자ID = NULL
WHERE   주문요청자ID = 'CUS_1';


I have disabled parallelism for the purpose of simplicity.


SET MAX_PARALLEL_WORKERS_PER_GATHER = 0;


The following is the query we have to investigate.


SELECT  T1.주문ID, T1.주문일시, COALESCE(T1.주문요청자ID,T2.주문처리자ID) 주문요청자ID
  FROM    주문  T1 LEFT JOIN 주문처리내역  T2
      ON t1.주문ID = t2.주문ID
    AND T2.주문순번 = 1
 WHERE T1.주문일시 >= TO_DATE('20220104','YYYYMMDD')
      AND T1.주문일시 < TO_DATE('20220105','YYYYMMDD')
ORDER BY T1.주문일시 ASC, T1.주문ID ASC;


Here, the poster asked readers about how we can improve above query performance. To be honest, I couldn't find any inefficiency in the above query even after reading the execution plan. The following is the output I got when I ran the EXPLAIN command with the (analyze, buffers, costs off) option.


Sort (actual time=149.187..149.509 rows=10000 loops=1)
  Sort Key: t1."주문일시", t1."주문id"
  Sort Method: quicksort  Memory: 1166kB
  Buffers: shared hit=52300
  ->  Nested Loop Left Join (actual time=5.921..144.637 rows=10000 loops=1)
        Buffers: shared hit=52300
        ->  Seq Scan on "주문" t1 (actual time=5.900..110.233 rows=10000 loops=1)
              Filter: (("주문일시" >= to_date('20220104'::text, 'YYYYMMDD'::text)) AND ("주문일시" < to_date('20220105'::text, 'YYYYMMDD'::text)))
              Rows Removed by Filter: 300000
              Buffers: shared hit=2300
        ->  Index Scan using "pk_주문처리내역" on "주문처리내역" t2 (actual time=0.003..0.003 rows=1 loops=10000)
              Index Cond: ((("주문id")::text = (t1."주문id")::text) AND ("주문순번" = '1'::numeric))
              Buffers: shared hit=50000
Planning:
  Buffers: shared hit=20
Planning Time: 0.268 ms
Execution Time: 149.770 ms


The following is how to read the plan.

1. The first thing done is a Seq Scan on "주문". (actual time=5.900..110.233 rows=10000 loops=1) means that the Seq Scan was executed 1 time (the loops value), that it returned 10,000 rows, and that the actual time was 110.233 ms.  Buffers: shared hit=2300 means that while doing a Seq Scan, 2300 data blocks were found in the shared buffer. You can find the number of blocks the table 주문 has with a query against pg_class like so:

  SELECT relpages FROM pg_class WHERE relname = '주문';

  relpages

 ------------

  2280

I can not explain why it accessed 2300 buffers, not 2280.

The results of the Seq Scan are passed up to a Nested Loop Left Join node.

2. For each row from Seq Scan on "주문", Nested Loop Left Join node calls Index Scan.

3. Index Scan probes the pk_주문처리내역 index to find a match with the Index Cond. When Index Scan finds a match, it passes it up to the Nested Loop Left Join node.

4. The Nested Loop Left Join node calls Index Scan again with its second row.

5. It iterates the process of the operation 2 to 4 until the Nested Loop Left Join exhausts 10,000 rows, which means that Index Scan is executed 10,000 times. We can infer that it took about 30 ms doing the Index Scan (0.003 * 10,000 = 30).

6. While doing the Index Scan 10,000 times, PostgreSQL had to access 50,000 blocks, which is a relatively high number compared to 2300 blocks it had to access while doing a Seq Scan.


So after reading the execution plan, can you spot any inefficiency? Instinctually I thought that the join method might not be optimal and I tried to force the planner to do a hash join. But it resulted in slower elapsed time. So I had to see the explanation of the poster. 


The drawback of the above execution plan is that it is accessing the index 10,000 times. When we take a close look at the COALESCE(T1.주문요청자ID,T2.주문처리자ID) function, we can infer that we don't have to access the T2 table when T1.주문요청자ID is not null. However, the execution plan shows that PostgreSQL is accessing the index of the T2 table regardless of the T1.주문요청자ID value.


So the trick here is to rewrite the query using a scalar subquery. The scalar subquery is executed after the main query is executed.


SELECT  T1.주문ID ,T1.주문일시
        ,COALESCE(T1.주문요청자ID
               ,(SELECT X.주문처리자ID
                 FROM 주문처리내역 X
                 WHERE X.주문ID = T1.주문ID AND X.주문순번 = 1)
             ) 주문요청자ID
FROM    주문 T1
WHERE   T1.주문일시 >= TO_DATE('20220104','YYYYMMDD')
AND     T1.주문일시 < TO_DATE('20220105','YYYYMMDD')
ORDER BY T1.주문일시 ASC, T1.주문ID ASC;


Sort (actual time=106.273..106.577 rows=10000 loops=1)
  Sort Key: t1."주문일시", t1."주문id"
  Sort Method: quicksort  Memory: 1166kB
  Buffers: shared hit=2800
  ->  Seq Scan on "주문" t1 (actual time=6.658..102.758 rows=10000 loops=1)
        Filter: (("주문일시" >= to_date('20220104'::text, 'YYYYMMDD'::text)) AND ("주문일시" < to_date('20220105'::text, 'YYYYMMDD'::text)))
        Rows Removed by Filter: 300000
        Buffers: shared hit=2800
        SubPlan 1
          ->  Index Scan using "pk_주문처리내역" on "주문처리내역" x (actual time=0.004..0.004 rows=1 loops=100)
                Index Cond: ((("주문id")::text = (t1."주문id")::text) AND ("주문순번" = '1'::numeric))
                Buffers: shared hit=500
Planning Time: 0.213 ms
Execution Time: 106.833 ms 


From the execution plan, we can deduce the following:

- The scalar subquery was executed 100 times (loops=100).

- While accessing the primary key index, it had to access 5 blocks.


By transforming the original SQL statement into using a scalar subquery we could decrease the block I/O from 52300 to 2800. And the elapsed time dropped from 149 ms to 106 ms.


Conclusion

When we join two tables using the LEFT JOIN clause,  consider transforming the query into using a scalar subquery or vice versa.



   

postgresdba.com