설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 177건, 최근 0 건
 

Improving PostgreSQL Query Performance Using ANY_VALUE()

글쓴이 : 모델광 날짜 : 2025-07-05 (토) 19:35 조회 : 45
In the past, when we wanted to include non-aggregated columns in the SELECT clause without impacting grouping behavior, we often relied on aggregate functions like MIN() or MAX(). Although functional, these aggregates could be somewhat expensive computationally.
To address this performance issue, Oracle introduced the ANY_VALUE() aggregate function in version 21c, and PostgreSQL added support for this function in version 16.

This note demonstrates how using ANY_VALUE() instead of MIN() or MAX() can significantly enhance query performance.

What is ANY_VALUE()?

When using ANY_VALUE() on a non-aggregated column within a GROUP BY query, PostgreSQL selects an arbitrary non-null value from each group. 


Here is the script to illustrate the concept with an example. Below, we first create and populate two test tables, interface_test and equipment_test, in PostgreSQL 17.2:

--Create and populate INTERFACE_TEST
DROP TABLE INTERFACE_TEST;
create table interface_test(
equ_num smallint
, if_dt timestamp WITHOUT time zone
, if_dt_seq int, 
if_sts_cd varchar(10)
, comt varchar(50) );

DO $$
DECLARE
  z integer;
  i integer;
  j integer;
BEGIN
  FOR z IN 1 .. 2000 LOOP
    FOR i IN 1 .. 10 LOOP
      FOR j IN 1 .. 20 LOOP
        INSERT INTO interface_test (equ_num, if_dt, if_dt_seq, if_sts_cd, comt)
        VALUES (z::int, date_trunc('second',now() - INTERVAL '1 day' * i)::timestamp, j::int, chr(65 + (j % 5))::varchar(10), md5(random()::text));
      END LOOP;
    END LOOP;
  END LOOP;
END $$;

create unique index interface_test_pk on interface_test(equ_num,if_dt,if_dt_seq);
alter table interface_test add constraint interface_test_pk primary key using index interface_test_pk;

--Create and populate EQUIPMENT_TEST
drop table equipments_test;
CREATE TABLE equipments_test (
  equ_num smallint,
  equ_nm VARCHAR(30),
  equ_tp_nm VARCHAR(10),
  equ_sts_nm VARCHAR(10),
  last_upd_dt timestamp
);

DO $$
DECLARE
  i integer;
BEGIN
  FOR i IN 1 .. 2000 LOOP
    INSERT INTO equipments_test (equ_num, equ_nm, equ_tp_nm, equ_sts_nm, last_upd_dt)
    VALUES (i::int, md5(random()::text)::varchar(30), 'SENSOR', 'ON', current_timestamp);
  END LOOP;
END $$;

create unique index equipments_test_pk on equipments_test(equ_num);
ALTER TABLE equipments_test ADD CONSTRAINT equipments_test_pk PRIMARY KEY USING INDEX equipments_test_pk;
create index equipments_test_x01 on equipments_test(equ_tp_nm);

Let's take a look at the following query using MAX() to display the equipment anme (EQU_NM):

SELECT A.EQU_NUM, MAX(A.EQU_NM), COUNT(*)
  FROM EQUIPMENTS_TEST A
  LEFT JOIN interface_test B
    ON A.EQU_NUM = B.EQU_NUM
 GROUP BY A.EQU_NUM;

(If you have read other articles I posted in this bulletin board, you might be tempted to use the optimization technique of placing GROUP BY before a join I introduce in the following note:
https://www.postgresdba.com/bbs/board.php?bo_table=C05&wr_id=236&page=3
To demonstrate my point, let's suppose that we can not do an aggregation before joining two tables here.)
Here's the execution plan obtained by running the EXPLAN(ANALYZE) command.

HashAggregate  (cost=12242.23..12262.23 rows=2000 width=42) (actual time=128.448..128.654 rows=2000 loops=1)
  Group Key: a.equ_num
  Batches: 1  Memory Usage: 369kB
  Buffers: shared hit=4145
  ->  Hash Right Join  (cost=66.00..9242.23 rows=400000 width=33) (actual time=0.511..78.295 rows=400000 loops=1)
        Hash Cond: (b.equ_num = a.equ_num)
        Buffers: shared hit=4145
        ->  Seq Scan on interface_test b  (cost=0.00..8124.00 rows=400000 width=2) (actual time=0.004..19.572 rows=400000 loops=1)
              Buffers: shared hit=4124
        ->  Hash  (cost=41.00..41.00 rows=2000 width=33) (actual time=0.501..0.502 rows=2000 loops=1)
              Buckets: 2048  Batches: 1  Memory Usage: 143kB
              Buffers: shared hit=21
              ->  Seq Scan on equipments_test a  (cost=0.00..41.00 rows=2000 width=33) (actual time=0.010..0.220 rows=2000 loops=1)
                    Buffers: shared hit=21
Planning:
  Buffers: shared hit=14
Planning Time: 0.214 ms
Execution Time: 128.735 ms

So what will the plan look like when we replace MAX(A.EQU_NM) with the newer ANY_VALUE(A.EQU_NM) function available in PostgreSQL 17?
Here's the execution plan for the query using ANY_VALUE(A.EQU_NM):

HashAggregate  (cost=12242.23..12262.23 rows=2000 width=42) (actual time=108.461..108.796 rows=2000 loops=1)
  Group Key: a.equ_num
  Batches: 1  Memory Usage: 369kB
  Buffers: shared hit=4145
  ->  Hash Right Join  (cost=66.00..9242.23 rows=400000 width=33) (actual time=0.265..70.659 rows=400000 loops=1)
        Hash Cond: (b.equ_num = a.equ_num)
        Buffers: shared hit=4145
        ->  Seq Scan on interface_test b  (cost=0.00..8124.00 rows=400000 width=2) (actual time=0.002..18.140 rows=400000 loops=1)
              Buffers: shared hit=4124
        ->  Hash  (cost=41.00..41.00 rows=2000 width=33) (actual time=0.259..0.260 rows=2000 loops=1)
              Buckets: 2048  Batches: 1  Memory Usage: 143kB
              Buffers: shared hit=21
              ->  Seq Scan on equipments_test a  (cost=0.00..41.00 rows=2000 width=33) (actual time=0.007..0.117 rows=2000 loops=1)
                    Buffers: shared hit=21
Planning:
  Buffers: shared hit=14
Planning Time: 0.144 ms
Execution Time: 108.889 ms

Can you notice anything different from the previous plan? The execution plans appear identical and the cost estimates are the same. The only difference is the Execution Time. The query using ANY_VALUE() consistently outperforms the one using MAX(). This difference arises because ANY_VALUE() does not require comparisons across all values within a group, thus reducing computational overhead.

Conclusion
The ANY_VALUE() aggregate function introduced in PostgreSQL 16 offers an efficient alternative to traditional aggregates like MIN() and MAX() when handling non-aggregated columns in GROUP BY queries.

Addendum
ANY_VALUE() is only suitable when the values within each group are known to be identical. In the above query, we replaced MAX() with ANY_VALUE() because we were certain that one equipment number only had one equipment name.

 

postgresdba.com