PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 169건, 최근 0 건

MIN/MAX with GROUP BY

글쓴이 : 모델광 날짜 : 2022-06-19 (일) 23:02 조회 : 1655

A question on stackoverflow.com prompted me to write this note.

https://stackoverflow.com/questions/54856702/postgresql-11-goes-for-parallel-seq-scan-on-partitioned-table-where-index-should

The most popular answer in the post concluded that "PostgreSQL punts when it sees a GROUP BY clause." And it also supplied the source code in src/backend/optimizer/plan/planagg.c to convince that there is not much point in optimizing MIN/MAX with GROUP BY.

I thought that I might be able to work out a solution to improve the performance of the query with ease. But it took me about a week to find a solution.

I'll explain how we can make the problematic query faster in this note.

One of the wonderful features I think PostgreSQL has is its "min/max" optimization. Rather than talking through it, here is a little demonstration.

CREATE TABLE PART (

C1 INT,

C2 VARCHAR(100),

C3 VARCHAR(10),

STATUS VARCHAR(1)

)

PARTITION BY LIST(C1)

;

CREATE TABLE PART1 PARTITION OF PART FOR VALUES IN(1);

CREATE TABLE PART2 PARTITION OF PART FOR VALUES IN(2);

CREATE TABLE PART3 PARTITION OF PART FOR VALUES IN(3);

CREATE TABLE PART4 PARTITION OF PART FOR VALUES IN(4);

INSERT INTO PART

SELECT i%4+1

, i::text||'WhoeverDoesNotLoveDoesNotKnowGodBecauseGodIsLove'

, substr(md5(random()::text),1,10)

, chr(65+i%3)

FROM GENERATE_SERIES(1,1000000) a(i);

CREATE INDEX PART_X01 ON PART(STATUS, C3);

CREATE INDEX PART_X02 ON PART(C1, STATUS);

CREATE TABLE NONPART AS SELECT * FROM PART;

CREATE INDEX NONPART_X01 ON NONPART(STATUS, C3);

I have created a non-partitioned table "NONPART" and a partitioned table "PART". Let's say that I want to get a minimum STATUS value and maximum STATUS value. Then my query would be:

SELECT MIN(STATUS), MAX(STATUS) FROM PART;

If you run the query in ORACLE, it is inevitable to do an index fast full scan which has to read all the index blocks. And it also has to do a sort aggregate by STATUS. The ORACLE execution plan would be as follows:

OPERATION NAME

-----------------------------------------------

SELECT STATEMENT

SORT AGGREGATE

INDEX FAST FULL SCAN PART_X01 --> this results in a lot of block I/Os

In ORACLE if you want to improve the response time, you have to rewrite it as follows.

SELECT MIN(MIN_VAL), MAX(MAX_VAL)

FROM (

SELECT MIN(STATUS) as MIN_VAL, NULL as MAX_VAL

FROM PART

UNION ALL

SELECT NULL AS MIN_VAL, MAX(STATUS) as MAX_VAL

FROM PART

)

;

Then how would PostgreSQL get the minimum and maximum values of STATUS? Below is the execution plan the optimizer produces with some cosmetic editing.

Result (actual time=0.137..0.138 rows=1 loops=1)
Buffers: shared hit=32
InitPlan 1 (returns $0)
    -> Limit (actual time=0.096..0.096 rows=1 loops=1)
          Buffers: shared hit=16
          -> Result (actual time=0.095..0.095 rows=1 loops=1)
                Buffers: shared hit=16
                -> Merge Append (actual time=0.094..0.094 rows=1 loops=1)
                      Sort Key: part.status
                      Buffers: shared hit=16
                      -> Index Only Scan using part1_status_c3_idx on part1 part_1 (actual time=0.016..0.016 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
                      -> Index Only Scan using part2_status_c3_idx on part2 part_2 (actual time=0.008..0.008 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
                      -> Index Only Scan using part3_status_c3_idx on part3 part_3 (actual time=0.007..0.007 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
                      -> Index Only Scan using part4_status_c3_idx on part4 part_4 (actual time=0.061..0.061 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
InitPlan 2 (returns $1)
    -> Limit (actual time=0.038..0.038 rows=1 loops=1)
          Buffers: shared hit=16
          -> Result (actual time=0.037..0.038 rows=1 loops=1)
                Buffers: shared hit=16
                -> Merge Append (actual time=0.037..0.037 rows=1 loops=1)
                      Sort Key: part_5.status DESC
                      Buffers: shared hit=16
                      -> Index Only Scan Backward using part1_status_c3_idx on part1 part_6 (actual time=0.009..0.009 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
                      -> Index Only Scan Backward using part2_status_c3_idx on part2 part_7 (actual time=0.007..0.007 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
                      -> Index Only Scan Backward using part3_status_c3_idx on part3 part_8 (actual time=0.016..0.016 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
                      -> Index Only Scan Backward using part4_status_c3_idx on part4 part_9 (actual time=0.005..0.005 rows=1 loops=1)
                            Index Cond: (status IS NOT NULL)
                            Heap Fetches: 0
                            Buffers: shared hit=4
Planning Time: 0.249 ms
Execution Time: 0.189 ms

In the above plan we can not see any aggregate operations. PostgreSQL doesn't need to acquire a lot of data and sort it when it knows that the "left hand" end of an index is the low values and the "right hand" end is the high values. Surely with the same amount of data the block I/O and the execution time are much smaller than those of ORACLE. The bigger the data, the more huge the performance gap is.

In the InitPlan 1 of the execution plan the optimizer could simply walk down the index branches to the left hand leaf and look at the single lowest entry in the leaf block to determine the lowest value for STATUS. (In a partitioned table, PostgreSQL supports only local indexes. So it had to access four local indexes.) The Limit is specifically designed to stop reading the object early when the desired row, the max/min value in this example, was extracted.

In the InitPlan 2 of the execution plan the optimizer could simply walk down the index branches to the right hand leaf and look at the single highest entry in the leaf block to determine the highest value for STATUS. It hit three index blocks and one heap block in each partition.

In the future if PostgreSQL implements global indexes in a partitioned table, it will be able to get the result without visiting every partition.

Now let's assume that we should extract min(c3) and max(c3) per STATUS. Then the query would be like this:

--From now on I use a non-partitioned table for simplicity of explanation.

SELECT STATUS, MIN(C3), MAX(C3)

FROM NONPART

GROUP BY STATUS;

Since we have an index on the columns (STATUS, C3), we would expect the optimizer produces an execution plan using the index. It would access the left-hand end of the index and the right-hand end of the index.

Now let's take a look at the execution plan.

Finalize GroupAggregate (actual time=208.101..210.502 rows=3 loops=1)
Group Key: status
Buffers: shared hit=12362
-> Gather Merge (actual time=208.093..210.491 rows=9 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=12362
        -> Sort (actual time=199.000..199.000 rows=3 loops=3)
              Sort Key: status
              Sort Method: quicksort Memory: 25kB
              Buffers: shared hit=12362
              Worker 0: Sort Method: quicksort Memory: 25kB
              Worker 1: Sort Method: quicksort Memory: 25kB
              -> Partial HashAggregate (actual time=198.967..198.968 rows=3 loops=3)
                    Group Key: status
                    Batches: 1 Memory Usage: 24kB
                    Buffers: shared hit=12346
                    Worker 0: Batches: 1 Memory Usage: 24kB
                    Worker 1: Batches: 1 Memory Usage: 24kB
                    -> Parallel Seq Scan on nonpart (actual time=0.006..37.536 rows=333333 loops=3)
                          Buffers: shared hit=12346
Planning Time: 0.082 ms
Execution Time: 210.534 ms

Disappointingly the optimizer didn't use the index. PostgreSQL is not smart enough to figure out it can use the index to speed up min()/max() with GROUP BY. So how can we work around this problem?

STATUS has small number of distinct values. PostgreSQL can extract distinct values super efficiently when the column is indexed. So I re-engineered the query as follows:

WITH RECURSIVE W AS (

(SELECT STATUS FROM NONPART ORDER BY STATUS LIMIT 1)

UNION ALL

SELECT (SELECT STATUS

FROM NONPART A

WHERE A.STATUS > W.STATUS

ORDER BY A.STATUS LIMIT 1)

FROM W

WHERE STATUS IS NOT NULL

)

SELECT W.STATUS, B.*

FROM W

JOIN LATERAL

(SELECT MIN(C3) AS MIN_C3, MAX(C3) AS MAX_C3

FROM NONPART B

WHERE W.STATUS = B.STATUS

) B ON TRUE

WHERE W.STATUS IS NOT NULL;

The key thing in the above query is that I got rid of the GROUP BY clause. According to the PostgreSQL source code "src/backend/optimizer/plan/planagg.c", you cannot get a performance boost with GROUP BY.

The SQL statement looks a little convoluted with the inline select statement after JOIN LATERAL. But it does very small amounts of work and it is only reading the leftmost leaf and the rightmost leaf of the index. We know from the above example that PostgreSQL can execute the MIN/MAX in the inline view after JOIN LATERAL very efficiently; we can hope (and check) that the inline view driven by the single values from the recursive row in W will operate just as efficiently - and here is the plan:

Nested Loop (actual time=0.134..0.411 rows=3 loops=1)
Buffers: shared hit=25 read=9
CTE w
    -> Recursive Union (actual time=0.082..0.289 rows=4 loops=1)
          Buffers: shared hit=11 read=3
          -> Limit (actual time=0.081..0.081 rows=1 loops=1)
                Buffers: shared hit=1 read=3
                -> Index Only Scan using nonpart_x01 on nonpart (actual time=0.080..0.080 rows=1 loops=1)
                      Heap Fetches: 0
                      Buffers: shared hit=1 read=3
          -> WorkTable Scan on w w_1 (actual time=0.051..0.051 rows=1 loops=4)
                Filter: (status IS NOT NULL)
                Rows Removed by Filter: 0
                Buffers: shared hit=10
                SubPlan 1
                  -> Limit (actual time=0.005..0.006 rows=1 loops=3)
                        Buffers: shared hit=10
                        -> Index Only Scan using nonpart_x01 on nonpart a (actual time=0.005..0.005 rows=1 loops=3)
                              Index Cond: (status > (w_1.status)::text)
                              Heap Fetches: 0
                              Buffers: shared hit=10
-> CTE Scan on w (actual time=0.084..0.292 rows=3 loops=1)
        Filter: (status IS NOT NULL)
        Rows Removed by Filter: 1
        Buffers: shared hit=11 read=3
-> Result (actual time=0.038..0.038 rows=1 loops=3)
        Buffers: shared hit=14 read=6
        InitPlan 3 (returns $4)
          -> Limit (actual time=0.009..0.009 rows=1 loops=3) --read the leftmost leaf block of the index
                Buffers: shared hit=10
                -> Index Only Scan using nonpart_x01 on nonpart b (actual time=0.009..0.009 rows=1 loops=3)
                      Index Cond: ((status = ($3)::text) AND (c3 IS NOT NULL))
                      Heap Fetches: 0
                      Buffers: shared hit=10
        InitPlan 4 (returns $5)
          -> Limit (actual time=0.028..0.028 rows=1 loops=3) --read the rightmost leaf block of the index
                Buffers: shared hit=4 read=6
                -> Index Only Scan Backward using nonpart_x01 on nonpart b_1 (actual time=0.027..0.027 rows=1 loops=3)
                      Index Cond: ((status = ($3)::text) AND (c3 IS NOT NULL))
                      Heap Fetches: 0
                      Buffers: shared hit=4 read=6
Planning Time: 0.164 ms
Execution Time: 0.479 ms

To understand the revised query, you need to have knowledge of the following:

1. PostgreSQL can get distinct values efficiently using the Common Table Expression.

2. Using a lateral join we can run everything after the lateral for each row before the lateral.

3. PostgreSQL is super excellent in MIN/MAX optimization on an indexed column. (This was explained in the above example.)

If we run the above query against a partitioned table(PART) instead of a non-partitioned table(NONPART), we will do lots of individual probes into the local indexes. Still they will stay very efficient using a MIN/MAX access.

Wrap Up

PostgreSQL looks at all the rows when it implements GROUP BY even when there is an index on columns.

When we want to improve MIN/MAX with GROUP BY, we should find a method to remove GROUP BY in the SQL statement.

E-MAIL: root@postgresdba.com