PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 177건, 최근 1 건

Incremental Sort 2

글쓴이 : 모델광 날짜 : 2023-05-02 (화) 21:43 조회 : 2434

Many applications need to paginate rows fetched from the database, or at least retrieve the first N rows. In most cases the data needs to be returned in a specific order too. In the previous note, I introduced a lovely PostgreSQL mechanism that can minimise sorting costs: the "incremental sort".

In this note, I will provide you with another example of how we can achieve significant performance improvements in PostgreSQL using incremental sort.

Recently I came across a complex pagination query during a migration from Oracle to PostgreSQL. The query in Oracle had a shape of the following:

SELECT ORD_NO, ORD_LINE_NO, CUST_ID, PROD_ID
FROM (SELECT
               A.ORD_NO, ORD_LINE_NO, CUST_ID, PROD_ID
          FROM (SELECT ORD_DATE
                    FROM (SELECT ORD_DATE
                            FROM ORDERS A, ORDERS_DETAIL B
                           WHERE A.ORD_NO = B.ORD_NO
                          AND A.ORD_DATE < '20191213'
                         ORDER BY A.ORD_DATE DESC
                     )
                 WHERE ROWNUM <= 20
                 GROUP BY ORD_DATE
              ) V, ORDERS A, ORDERS_DETAIL B
          WHERE V.ORD_DATE = A.ORD_DATE
            AND A.ORD_NO   = B.ORD_NO
            AND A.ORD_DATE < '20191213'
          ORDER BY A.ORD_DATE DESC, B.ORD_AMT
      )
WHERE ROWNUM <= 20;

If you want to reproduce the experiment, here is the script to create the test data.

create table ORDERS (ord_no NUMBER,cust_id varchar2(20),commnt varchar2(100),ord_date varchar2(8));

ALTER TABLE ORDERS ADD CONSTRAINT PK_ORDERS PRIMARY KEY(ORD_NO);

DROP TABLE ORDERS_DETAIL;

CREATE TABLE ORDERS_DETAIL(ORD_LINE_NO NUMBER NOT NULL,ORD_NO NUMBER NOT NULL,PROD_ID VARCHAR2(10) NOT NULL,COMMNT VARCHAR2(100),ORD_AMT NUMBER);

insert into ORDERS

select rownum as ord_no
     , 'C'||mod(rownum,10) as cust_id
     , lpad('X',10,'Y') as commnt
     , to_char(to_date('20191001','YYYYMMDD')+mod(rownum,60),'yyyymmdd') as ord_date
from XMLTABLE('1 to 1000000');

CREATE INDEX ORDERS_X02 ON ORDERS(ORD_DATE);

INSERT INTO ORDERS_DETAIL
SELECT rownum as ORD_LINE_NO
     , mod(rownum,1000000) AS ORD_NO
     , 'PP'||MOD(rownum,5) AS PROD_ID
     , lpad('X',10,'Y') as commnt
     , case when rownum < 1000 then rownum*100 else rownum end as prod_amt
FROM XMLTABLE('1 to 10000000');

ALTER TABLE ORDERS_DETAIL ADD CONSTRAINT PK_ORDERS_DETAIL PRIMARY KEY(ORD_LINE_NO);

CREATE INDEX ORDERS_DETAIL_X01 ON ORDERS_DETAIL(ORD_NO, PROD_ID);
CREATE INDEX ORDERS_DETAIL_X02 ON ORDERS_DETAIL(ORD_NO, ORD_AMT);

EXEC DBMS_STATS.GATHER_TABLE_STATS('','ORDERS');
EXEC DBMS_STATS.GATHER_TABLE_STATS('','ORDERS_DETAIL');

My initial thought on the query was that it broke the golden rule of query tuning:
Try to access a table just one time.

The above query accessed ORDERS and ORDERS_DETAIL two times, which I thought was a bad thing. So I rewrote the query to access each table one time as follows:

SELECT *
FROM (SELECT A.ORD_NO, ORD_LINE_NO, A.CUST_ID, PROD_ID
            FROM ORDERS A, ORDERS_DETAIL B
         WHERE A.ORD_NO = B.ORD_NO
        AND A.ORD_DATE < '20191213'
       ORDER BY A.ORD_DATE DESC, B.ORD_AMT
          )
WHERE ROWNUM <= 20;

However, unlike what I had expected, the revised query had a significant performance degradation.
Here, in order, are the two execution plans of the above SQL statements, pulled from memory with rowsource execution stats enabled:

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name              | Starts | E-Time   | A-Rows |   A-Time   | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                   |      1 |          |     20 |00:00:00.72 |     205K|       |       |          |
|* 1 | COUNT STOPKEY                            |                   |      1 |          |     20 |00:00:00.72 |     205K|       |       |          |
|   2 |   VIEW                                    |                   |      1 | 00:00:07 |     20 |00:00:00.72 |     205K|       |       |          |
|* 3 |    SORT ORDER BY STOPKEY                  |                   |      1 | 00:00:07 |     20 |00:00:00.72 |     205K| 2048 | 2048 | 2048 (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | ORDERS_DETAIL     |      1 | 00:00:01 |    166K|00:00:00.62 |     205K|       |       |          |
|   5 |      NESTED LOOPS                         |                   |      1 | 00:00:04 |    166K|00:00:00.22 |   38401 |       |       |          |
|   6 |       NESTED LOOPS                        |                   |      1 | 00:00:01 | 16666 |00:00:00.04 |    4603 |       |       |          |
|* 7 |        VIEW                               |                   |      1 | 00:00:01 |      1 |00:00:00.01 |      10 |       |       |          |
|   8 |         HASH GROUP BY                     |                   |      1 | 00:00:01 |      1 |00:00:00.01 |      10 | 1818K| 1818K| 536K (0)|
|* 9 |          COUNT STOPKEY                    |                   |      1 |          |     20 |00:00:00.01 |      10 |       |       |          |
| 10 |           VIEW                            |                   |      1 | 00:00:01 |     20 |00:00:00.01 |      10 |       |       |          |
| 11 |            NESTED LOOPS                   |                   |      1 | 00:00:01 |     20 |00:00:00.01 |      10 |       |       |          |
| 12 |             TABLE ACCESS BY INDEX ROWID   | ORDERS            |      1 | 00:00:01 |      2 |00:00:00.01 |       4 |       |       |          |
|* 13 |              INDEX RANGE SCAN DESCENDING | ORDERS_X02        |      1 | 00:00:01 |      2 |00:00:00.01 |       3 |       |       |          |
|* 14 |             INDEX RANGE SCAN              | ORDERS_DETAIL_X01 |      2 | 00:00:01 |     20 |00:00:00.01 |       6 |       |       |          |
| 15 |        TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS            |      1 | 00:00:01 | 16666 |00:00:00.03 |    4593 |       |       |          |
|* 16 |         INDEX RANGE SCAN                  | ORDERS_X02        |      1 | 00:00:01 | 16666 |00:00:00.01 |      49 |       |       |          |
|* 17 |       INDEX RANGE SCAN                    | ORDERS_DETAIL_X01 | 16666 | 00:00:01 |    166K|00:00:00.16 |   33798 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation               | Name          | Starts | E-Time   | A-Rows |   A-Time   | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |               |      1 |          |     20 |00:00:09.26 |   55515 |       |       |          |
|* 1 | COUNT STOPKEY          |               |      1 |          |     20 |00:00:09.26 |   55515 |       |       |          |
|   2 |   VIEW                  |               |      1 | 00:00:05 |     20 |00:00:09.26 |   55515 |       |       |          |
|* 3 |    SORT ORDER BY STOPKEY|               |      1 | 00:00:05 |     20 |00:00:09.26 |   55515 | 4096 | 4096 | 4096 (0)|
|* 4 |     HASH JOIN           |               |      1 | 00:00:02 |   9999K|00:00:06.60 |   55515 |    63M| 8026K|   63M (0)|
|* 5 |      TABLE ACCESS FULL | ORDERS        |      1 | 00:00:01 |   1000K|00:00:00.13 |    4604 |       |       |          |
|   6 |      TABLE ACCESS FULL | ORDERS_DETAIL |      1 | 00:00:01 |     10M|00:00:01.33 |   50911 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------
As you can see, even though the number of block IOs decreased from 205,000 to 55,515, the elapsed time skyrocketted from 0.72 seconds to 9.26 seconds. So I had to admit that the rewritten query was not a good idea. In fact, even though the first query was complex and difficult to understand what it was trying to return, the first query must have been written by an execellent developer who knows the Oracle optimizer well.

The developer tried to let the optimizer walk the ORDERS_X02 index, retrieving just 20 rows. At operations from 9 to 14, we can observe that The ORDERS table is accessed using an INDEX RANGE SCAN DESCENDING operation on the ORDERS_X02 index, which accesses the data exactly the right order and allows the "COUNT STOPKEY" at operation 9. Oracle has fetched 20 rows and reported the block IO of 10 at operation 11. The developer grouped the intermediate result set (20 rows) by ORD_DATE in order to make the ORD_DATE values unique, so that the data could not be multiplied when the intermediate result set was joined to the ORDERS table again. Using the 20 rows, Oracle has accessed the ORDERS table and the ORDERS_DETAIL table again.

This time, Oracle has used an index range scan on the ORDERS_X02. We can deduce that there were 16,666 rows on one day at operations 16 and 15. Oracle accesed the ORDERS_DETAIL_X01 index 16,666 times at operation 17, incurring 33798 block IOs. Compared to the second execution plan where the numbers of rows involved in the join are 1000k and 10m respectively, the number of rows participating in the join in the first plan is substantially small, resuling in faster execution of the query.

After doing this experiment in Oracle, I thought I had to use the first query with the ROWNUM when migrating it from Oracle to PostgreSQL.

Here is a little script to build some demo data. This demo was tested on PostgreSQL 15.

DROP TABLE ORDERS;

create table ORDERS (ord_no bigint,cust_id varchar(20),comment varchar(100),ord_date varchar(8));

ALTER TABLE ORDERS ADD CONSTRAINT PK_ORDERS PRIMARY KEY(ORD_NO);

DROP TABLE ORDERS_DETAIL;

CREATE TABLE ORDERS_DETAIL(ORD_LINE_NO BIGINT NOT NULL,ORD_NO BIGINT NOT NULL,PROD_ID VARCHAR(10) NOT NULL,COMMENT VARCHAR(100),ORD_AMT BIGINT);

insert into ORDERS
select i as ord_no
       , 'C'||mod(i,10) as cust_id
       , lpad('X',10,'Y') as comment
       , to_char(to_date('20191001','YYYYMMDD')+mod(i,60),'yyyymmdd') as ord_date
from generate_series(1,1000000) a(i);

CREATE INDEX ORDERS_X02 ON ORDERS(ORD_DATE);

INSERT INTO ORDERS_DETAIL
SELECT i as ORD_LINE_NO
         , mod(i,1000000) AS ORD_NO
         , 'PP'||MOD(i,5) AS PROD_ID
         , lpad('X',10,'Y') as comment
         , case when i < 1000 then i*100 else i end as prod_amt
FROM generate_series(1,10000000) a(i);

ALTER TABLE ORDERS_DETAIL ADD CONSTRAINT PK_ORDERS_DETAIL PRIMARY KEY(ORD_LINE_NO);

CREATE INDEX ORDERS_DETAIL_X01 ON ORDERS_DETAIL(ORD_NO, PROD_ID);
CREATE INDEX ORDERS_DETAIL_X02 ON ORDERS_DETAIL(ORD_NO, ORD_AMT);

I have disabled parallel processing for the purpose of simplicity and clarity.

set max_parallel_workers_per_gather = 0;

Here is the query I rewrote to make it compatible in PostgreSQL followed by its execution plan.

SELECT A.ORD_NO, ORD_LINE_NO, A.CUST_ID, PROD_ID
   FROM (SELECT ORD_DATE
            FROM (SELECT ORD_DATE
                   FROM ORDERS A, ORDERS_DETAIL B
                     WHERE A.ORD_NO = B.ORD_NO
                         AND A.ORD_DATE < '20191213'
                 ORDER BY A.ORD_DATE DESC
                   ROWS FETCH NEXT 20 ROWS ONLY
                    ) K
           GROUP BY ORD_DATE
          ) V, ORDERS A, ORDERS_DETAIL B
WHERE V.ORD_DATE = A.ORD_DATE
     AND A.ORD_NO = B.ORD_NO
     AND A.ORD_DATE < '20191213'
ORDER BY A.ORD_DATE DESC, B.ORD_AMT
FETCH NEXT 20 ROWS ONLY;

Limit (actual time=372.279..372.285 rows=20 loops=1)
Buffers: shared hit=118334 read=105711
-> Sort (actual time=372.278..372.282 rows=20 loops=1)
        Sort Key: a.ord_date DESC, b.ord_amt
        Sort Method: top-N heapsort Memory: 27kB
        Buffers: shared hit=118334 read=105711
        -> Nested Loop (actual time=2.068..336.262 rows=166660 loops=1)
              Buffers: shared hit=118334 read=105711
              -> Nested Loop (actual time=2.053..30.576 rows=16666 loops=1)
                    Buffers: shared hit=17 read=7370
                    -> HashAggregate (actual time=0.040..0.042 rows=1 loops=1)
                          Group Key: a_1.ord_date
                          Batches: 1 Memory Usage: 24kB
                          Buffers: shared hit=13 read=4
                          -> Limit (actual time=0.023..0.035 rows=20 loops=1)
                                Buffers: shared hit=13 read=4
                                -> Nested Loop (actual time=0.023..0.034 rows=20 loops=1)
                                      Buffers: shared hit=13 read=4
                                      -> Index Scan Backward using orders_x02 on orders a_1 (actual time=0.015..0.017 rows=2 loops=1)
                                            Index Cond: ((ord_date)::text < '20191213'::text)
                                            Buffers: shared hit=2 read=3
                                      -> Index Only Scan using orders_detail_x01 on orders_detail b_1 (actual time=0.004..0.006 rows=10 loops=2)
                                            Index Cond: (ord_no = a_1.ord_no)
                                            Heap Fetches: 0
                                            Buffers: shared hit=11 read=1
                    -> Bitmap Heap Scan on orders a (actual time=2.011..28.082 rows=16666 loops=1)
                          Recheck Cond: (((ord_date)::text = (a_1.ord_date)::text) AND ((ord_date)::text < '20191213'::text))
                          Heap Blocks: exact=7353
                          Buffers: shared hit=4 read=7366
                          -> Bitmap Index Scan on orders_x02 (actual time=1.232..1.232 rows=16666 loops=1)
                                Index Cond: (((ord_date)::text = (a_1.ord_date)::text) AND ((ord_date)::text < '20191213'::text))
                                Buffers: shared hit=3 read=14
              -> Index Scan using orders_detail_x01 on orders_detail b (actual time=0.005..0.017 rows=10 loops=16666)
                    Index Cond: (ord_no = a.ord_no)
                    Buffers: shared hit=118317 read=98341
Planning:
Buffers: shared hit=4 read=12
Planning Time: 0.606 ms
Execution Time: 372.437 ms

Note that the structure of the plan is almost identical to that of Oracle. To force this plan, I installed pg_hint_plan and added "hints" in the query. Sometimes when the optimizer has made a mistake in its choice of execution plans, you may have to attach hints to your code.

In passing the elapsed time is 372 mili seconds, which is about half that of Oracle's 720 mili seconds. It seems that we did a good job when migrating the query from Oracle. However, the snag is that the query is complex and hard to read. Instead of calling it a day, I decided to conduct the experiment with a simple query.

Here is the simplified version of the query. which is easy to read and understand.

SELECT A.ORD_NO, ORD_LINE_NO, CUST_ID, PROD_ID
   FROM ORDERS A, ORDERS_DETAIL B
WHERE A.ORD_NO = B.ORD_NO
     AND A.ORD_DATE < '20191213'
ORDER BY A.ORD_DATE DESC, B.ORD_AMT
FETCH NEXT 20 ROWS ONLY;

Let's see what the execution plan for the simple query looks like now:

Limit (actual time=359.754..359.757 rows=20 loops=1)
Buffers: shared hit=118404 read=105712
-> Incremental Sort (actual time=359.752..359.754 rows=20 loops=1)
        Sort Key: a.ord_date DESC, b.ord_amt
        Presorted Key: a.ord_date
        Full-sort Groups: 1 Sort Method: top-N heapsort Average Memory: 27kB Peak Memory: 27kB
        Pre-sorted Groups: 1 Sort Method: top-N heapsort Average Memory: 26kB Peak Memory: 26kB
        Buffers: shared hit=118404 read=105712
        -> Nested Loop (actual time=0.025..328.943 rows=166661 loops=1)
              Buffers: shared hit=118404 read=105712
              -> Index Scan Backward using orders_x02 on orders a (actual time=0.017..20.763 rows=16667 loops=1)
                    Index Cond: ((ord_date)::text < '20191213'::text)
                    Buffers: shared hit=84 read=7370
              -> Index Scan using orders_detail_x01 on orders_detail b (actual time=0.005..0.017 rows=10 loops=16667)
                    Index Cond: (ord_no = a.ord_no)
                    Buffers: shared hit=118320 read=98342
Planning:
Buffers: shared hit=3 read=13
Planning Time: 0.236 ms
Execution Time: 359.782 ms

Note that the wonderful Incremental Sort operation kicked in.
You will also notice that PostgreSQL is doing a nested loop join, fetching the first row from the ORDERS table in the right order by Index Scan Backward then fetching the matching rows from the ORDERS_DETAIL table and then doing an incremental sort on those rows. This process is repeated until a total of 20 rows are obtained. After running the query several times, we get a stable timing at around 360ms. Compared to the complex query that took 372 ms, we get a consistent saving of about 10 ms. Furthermore, this query is easy for developers to write and maintain.

The below query in Oracle took 9.26 secondes to complete, while the query with an identical structure took only 0.35 seconds in PostgreSQL.

SELECT *
FROM (SELECT A.ORD_NO, ORD_LINE_NO, A.CUST_ID, PROD_ID
            FROM ORDERS A, ORDERS_DETAIL B
         WHERE A.ORD_NO = B.ORD_NO
        AND A.ORD_DATE < '20191213'
       ORDER BY A.ORD_DATE DESC, B.ORD_AMT
          )
WHERE ROWNUM <= 20;

The performance difference is due to the fact that Oracle is doing a hash join that produces a 9999k result set, sorting it to get the 20 rows. On the other hand, PostgreSQL is doing a nested loop join that produces a 166k result set presorted by the first column, then sorts it to get the final 20 rows.

Conclusion

When you migrate paginating queries from Oracle to PostgreSQL, keep in mind that the incremental sort can make a huge difference to performance in PostgreSQL.

If we have an "order by table1.colA, table2.colB" and the optimizer uses an indexed access path on table1.colA and a nested loop into table2, then the optimizer will still recognize that the generated data is partially sorted, and sort batches for table1.colA to order them by table2.colB.

This approach can help avoid performing a very large sort that might spill to disk, particularly for a "fetch next N rows only" query that uses PostgreSQL's incremental sort mechanism.

Footnote
We can check the plan without the incremental sort with a set command to disable the feature:

set enable_incremental_sort = off;

Checking the execution plan with the incremental sort disabled is left as an exercise to the reader.

E-MAIL: root@postgresdba.com