PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 163건, 최근 0 건

Partial Range Processing

글쓴이 : 모델광 날짜 : 2022-03-26 (토) 18:34 조회 : 1257

In the last article titled Scalar Subquery Optimization, I mentioned that Oracle sends the intermediate result set to the client as soon as it retrieves 15 rows by default, which makes the end user "feel super fast". SQL developers in Oracle take advantage of this feature to speed up their queries. Sometimes they change a hash join to a nested loop join to use this architecture. If my memory serves me right, Lee HwaSik called this feature (or architecture) "Partial Range Processing".

If you are interested in what partial range processing is, refer to this blog.

http://www.gurubee.net/lecture/2468

On the other hand PostgreSQL does not offer such functionality. PostgreSQL has to retrieve all rows before sending the output to the client. Whenever a query returns a tremendous amount of rows, the number of rows retrieved at a time becomes paramount for overall data transmission speed.

In this note I will suply a tuning technique in PostgreSQL which enables the end user to feel super fast.

Here is a simple, short code to demonstrate my point. I have created the table ORDERS_DETAIL with 1,000,000 rows and the table PROD with 5 rows.

CREATE TABLE ORDERS_DETAIL(
ORD_LINE_NO BIGINT NOT NULL
,ORD_NO BIGINT NOT NULL
,PROD_ID VARCHAR(10) NOT NULL
,COMMENT VARCHAR(100)
,ORD_AMT BIGINT);

ALTER TABLE ORDERS_DETAIL ADD CONSTRAINT PK_ORDERS_DETAIL PRIMARY KEY(ORD_LINE_NO);
CREATE INDEX ORDERS_DETAIL_X01 ON ORDERS_DETAIL(ORD_NO, PROD_ID);

INSERT INTO ORDERS_DETAIL
SELECT i as ORD_LINE_NO
          , mod(i,1000000) AS ORD_NO
          , 'PP'||MOD(i,5) AS PROD_ID
          , lpad('X',10,'Y') as comment
      , case when i < 1000 then i*100 else i end as prod_amt
FROM generate_series(1,10000000) a(i);

CREATE TABLE PROD (PROD_ID VARCHAR(10) NOT NULL,PROD_NM VARCHAR(100) NOT NULL);
ALTER TABLE PROD ADD CONSTRAINT PK_PROD PRIMARY KEY(PROD_ID);

INSERT INTO PROD
SELECT PROD_ID, MAX(ORD_NO)||'TEST_NAME'
FROM ORDERS_DETAIL
GROUP BY PROD_ID;

The following was the SQL statement in ORACLE.

SELECT A.ORD_NO
      ,(SELECT PROD_NM
          FROM PROD
         WHERE PROD_ID = A.PROD_ID) AS PROD_NM
FROM ORDERS_DETAIL A
WHERE A.ORD_NO BETWEEN 1 AND 100000;

With the same amount of data, in Oracle the resulting data showed up on the user screen in 10 ms. But In PostgreSQL the resulting data showed up on the user screen after 1 or 2 seconds.

Below is the execution plan in PostgreSQL. I used PostgreSQL's explain (analyze, buffers, costs off) command to return the query plan and execution details.

Seq Scan on orders_detail a (actual time=0.076..1479.479 rows=1000000 loops=1)
Filter: ((ord_no >= 1) AND (ord_no <= 100000))
Rows Removed by Filter: 9000000
Buffers: shared hit=1000032 read=83302
SubPlan 1
    -> Seq Scan on prod (actual time=0.000..0.001 rows=1 loops=1000000)
          Filter: ((prod_id)::text = (a.prod_id)::text)
          Rows Removed by Filter: 4
          Buffers: shared hit=1000000
Planning Time: 0.148 ms
Execution Time: 1520.440 ms

The output of the command explain (analyze, buffers, costs off) reveals some important results:

1. We spent 0.148ms planning and about 1.5 seconds executing.

2. We executed the scalar subquery 1000,000 times and the time spent on executing the subquery once was 0.001 ms. The number 0.001 is a rounded number which means that it is inappropriate to use the number 0.001 to calculate the total time of executing the subquery 1,000,000 times. However, the toal time of executing the subquery can be calculated as follows:

0.001 * 1,000,000 = 1,000 ms

3. The elapsed time to perform Seq Scan on the table ORDERS_DETAIL can be calculated as follows:

1479.479 - 0.001 * 1,000,000 = 479.479 ms

4. While executing the scalar subqery we hit 1,000,000 buffers which means that the block was in the shared buffer cache and the block size of the table PROD is just 1.

5. While doing Seq Scan on orders_detail we hit 32 (=1,000,032 - 1,000,000) buffers

and read 83302 blocks. A read means we went to the fiile system because we couldn't find 83302 blocks in the buffer cache. Unlike Oracle's execution plan, shared hit in PostgreSQL's execution plan does not include the number of disk IOs.

When I checked the number of pages of the table orders_detail, the output of block IOs made sense.

select relpages from pg_class where relname='orders_detail';

relpages

-----------

83334

We can infer that it is going to take more than 1.52 seconds to receive 1000000 rows in full. Oracle server sends 15 rows first to the client, which is the reason why the rsuling data show up on the screen in 10 ms. The first node of the execution plan above tells us that it took just 0.076 ms to retrieve the first row. So we can safely predict that it would take less then 1ms to extract 15 rows. If we can send the 15 rows to the end user's screen like Oracle does it, the end user will feel the application program is very responsive.

In PostgreSQL this issue can be overcome with the use of SQL commands:
DECLARE a cursor and FETCH the needed rows.

It is somewhat cumbersome, though, you can have the client fetch the output in batches, several rows at a time. I put the original SQL statement into the "DECLARE a cursor and FETCH the needed rows" format.

analdb=# BEGIN;
BEGIN
Time: 0.615 ms
analdb=*#   declare c1 cursor FOR
analdb-*#    SELECT A.ORD_NO
analdb-*#       ,(SELECT PROD_NM
analdb(*#           FROM PROD
analdb(*#          WHERE PROD_ID = A.PROD_ID) AS PROD_NM
analdb-*#   FROM ORDERS_DETAIL A
analdb-*# WHERE A.ORD_NO BETWEEN 1 AND 100000;
Time: 2.392 ms
analdb=*# FETCH 10 FROM c1;   --Let's assume that the user screen can list ten rows at a time.

ord_no |     prod_nm
--------+-----------------
      2 | 999997TEST_NAME
      3 | 999998TEST_NAME
      4 | 999999TEST_NAME
      5 | 999995TEST_NAME
      6 | 999996TEST_NAME
      7 | 999997TEST_NAME
      8 | 999998TEST_NAME
      9 | 999999TEST_NAME
     10 | 999995TEST_NAME
     11 | 999996TEST_NAME
(10 rows)

Time: 0.309 ms

Note that the total elapsed time to send the 10 rows to the client can be calculated as follows:
0.615 + 2.392 + 0.309 = 3.4 ms

Now the end user will be able to see the output in 3.4 ms. From the end user's point of view this technique yields a significant speedup: 1520 ms -> 3.4 ms. If the end user wants to see the next screen, the application program has to issue the "FETCH 10 FROM c1" again. Then the client server will be able to receive the next 10 rows.

If you want the C1 cursor to be used after the transaction that created it successfully commits, you should specify WITH HOLD when declaring a cursor.

BEGIN;
declare c1 cursor WITH HOLD FOR
   SELECT A.ORD_NO
      ,(SELECT PROD_NM
          FROM PROD
         WHERE PROD_ID = A.PROD_ID) AS PROD_NM
FROM ORDERS_DETAIL A
WHERE A.ORD_NO BETWEEN 1 AND 100000;
FETCH 10 FROM c1;
COMMIT;

You will notice that when declaring the C1 cursor, it took just 2.392 ms.

If a cursor is used, the planner selects a plan that optimally retrieves the number of rows equal to cursor_tuple_fraction (0.1 by default) of the total number of matching rows. So the execution plan of the query in a cursor may be different from that of the original SQL statement.

As always, I hope this article will save someone a bit of time learning, debugging, and optimizing.

Conclusion

You can implement the Oracle feature "Partial Range Processing" by issuing your query in a "DECLARE a cursor and FETCH the needed rows" format.

E-MAIL: root@postgresdba.com