설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 175건, 최근 0 건
   

row_number() optimization in PostgreSQL 15.3

글쓴이 : 모델광 날짜 : 2025-03-16 (일) 19:17 조회 : 222
When migrating queries from Oracle to PostgreSQL, subtle differences in query execution can sometimes lead to unexpected performance issues. One such case involves the ROW_NUMBER() function combined with ORDER BY and FETCH NEXT n ROWS ONLY.

Unlike Oracle, PostgreSQL does not always optimize index usage efficiently in this scenario, potentially scanning far more rows than necessary. Let’s explore this issue and how to fix it.

To demonstrate my point I set up a little test, connected to a local instance of PostgreSQL 15.3.

create table ORDERS (ord_no bigint,cust_id varchar(20),comment varchar(100),ord_date varchar(8));

ALTER TABLE ORDERS ADD CONSTRAINT PK_ORDERS PRIMARY KEY(ORD_NO);
create index orders_x01 on orders(ord_date);

--I have populated the test table with 1,000,000 rows.
insert into ORDERS
select i as ord_no
       , 'C'||mod(i,10) as cust_id
       , lpad('X',10,'Y') as comment
       , to_char(to_date('20191001','YYYYMMDD')+mod(i,60),'yyyymmdd') as ord_date
from generate_series(1,1000000) a(i);

Now let's run a query to fetch the first 5 rows ordered by ORD_DATE, using row_number().

explain (analyze, buffers, costs off)
SELECT ORD_NO
     , ROW_NUMBER() OVER (ORDER BY ORD_DATE) RN
     , ORD_DATE
 FROM ORDERS
FETCH NEXT 5 ROWS ONLY;


Limit (actual time=8.944..8.948 rows=5 loops=1)
  Buffers: shared hit=8351
  ->  WindowAgg (actual time=8.942..8.945 rows=5 loops=1)
        Buffers: shared hit=8351
        ->  Index Scan using orders_x01 on orders (actual time=0.014..6.453 rows=16667 loops=1)
              Buffers: shared hit=8351
Planning Time: 0.060 ms
Execution Time: 9.068 ms

Why Is this Execution Plan Inefficient?
At first glance it might be difficult to find any inefficiency in this plan. To be honest, I couldn't find anything wrong in the plan before I checked the plan in Oracle 12c. You'll notice the ineffiency of this plan with ease when you see the Oracle execution plan. Here is the execution plan generated on Oracle 12c:
-------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |      1 |           |      5 |00:00:00.01 |      7 |
|*  1 |  COUNT STOPKEY                |            |      1 |        |        5 |00:00:00.01 |      7 |
|   2 |   VIEW                                |            |      1 |    812K|      5 |00:00:00.01 |      7 |
|   3 |    WINDOW NOSORT             |            |      1 |    812K|      5 |00:00:00.01 |     7 |
|   4 |     TABLE ACCESS BY INDEX ROWID| ORDERS|   1 |  812K|  5 |00:00:00.01 |    7 |
|*  5 |      INDEX RANGE SCAN        | ORDERS_X01 |   1 |         |   5 |00:00:00.01 |    4 |
-------------------------------------------------------------------------------------------------------
Now we can observe that Oracle accessed only 7 blocks, whereas PostgreSQL accessed 8351 blocks.

This is a very important point: PostgreSQL anticipates it needs to order an appreciable amount of rows of the ORDERS table by ORD_DATE to accurately calculate the row numbers(RN) before it can apply the FETCH NEXT 5 ROWS ONLY. This requires scanning a significant portion of the index ORDERS_X01 to sort and number each row, hence accessing 16667 rows.

To improve PostgreSQL's execution plan, we modify the query to specify a ROWS mode:

EXPLAIN(ANALYZE, BUFFERS, COSTS OFF)
SELECT ORD_NO
     , ROW_NUMBER() OVER (ORDER BY ORD_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) RN
     , ORD_DATE
 FROM ORDERS
FETCH NEXT 5 ROWS ONLY;


Here is the change in the execution plan - spot the number of shared hit and the actual rows the optimizer accessed.

Limit (actual time=0.017..0.022 rows=5 loops=1)
  Buffers: shared hit=6
  ->  WindowAgg (actual time=0.016..0.020 rows=5 loops=1)
        Buffers: shared hit=6
        ->  Index Scan using orders_x01 on orders (actual time=0.013..0.015 rows=5 loops=1)
              Buffers: shared hit=6
Planning Time: 0.052 ms
Execution Time: 0.035 ms


The specification ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING does not change the logical outcome of ROW_NUMBER() in this context, since ROW_NUMBER() inherently considers all rows within the partition. However, the presence of this window frame might have influenced PostgreSQL's planner to optimize the query differently. In this case only 5 rows were read as the optimizer recognized an opportunity to apply FETCH NEXT 5 ROWS ONLY earlier.

Conclusion
The execution plan without the ROWS mode is very inefficient. We have to specify the window frame to help the planner make an efficient execution plan, thereby reducing the work required.

   

postgresdba.com