PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 162건, 최근 0 건

Extracting historical data

글쓴이 : 모델광 날짜 : 2023-02-25 (토) 20:10 조회 : 810

Recently, I received a request to optimize a query for performace. During the tuning process, I discovered a surprising behavior of the PostgreSQL optimizer.
I believe it is worth sharing my findings and providing tips on how to work around this unexpected behaviour of PostgreSQL.
The statement I was asked to tune was as follows:

SELECT CUST_NO, CUST_NM, ORD_NO, B_ARR[1] AS CUST_CHG_DT

, B_ARR[2] AS CUST_GRDE_CD, C_ARR[1] AS ORD_CHG_DT

, C_ARR[2] AS ORD_STS_CD

FROM (SELECT A.CUST_NO, A.CUST_NM, Y.ORD_NO

, (SELECT ARRAY[CHANGE_DT::TEXT, CUST_GRDE_CD::TEXT]

FROM CUST_HIST B

WHERE B.CUST_NO = A.CUST_NO

AND B.CHANGE_DT <= '20230220'::DATE

ORDER BY B.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY) AS B_ARR

, (SELECT ARRAY[CHANGE_DT::TEXT, ORD_STS_CD]

FROM ORDER_HIST C

WHERE Y.ORD_NO = C.ORD_NO

AND C.CHANGE_DT <= '20230220'::DATE

ORDER BY C.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY) AS C_ARR

        FROM CUST A, ORDERS Y
        WHERE A.CUST_NO = Y.CUST_NO
       ) AS FOO;

For clear understanding of the query, I have added the ERD below:

If you want to do experiments, here's the script I used to generate the data:

CREATE TABLE CUST AS
SELECT i AS CUST_NO
, 'CUST_NAME_SAMPLE_'||i::text AS CUST_NM
, MOD(i,4) + 1 AS CUST_GRDE_CD
FROM generate_series(1, 2000) a(i);
ALTER TABLE CUST ADD CONSTRAINT PK_CUST PRIMARY KEY (CUST_NO);

CREATE TABLE CUST_HIST AS
SELECT A.CUST_NO, B.*
FROM (SELECT CUST_NO FROM CUST) A
, (SELECT CURRENT_DATE -i AS CHANGE_DT
, MOD(i, 4) + 1 AS CUST_GRDE_CD
FROM generate_series(1,10) a(i)
) B;

ALTER TABLE CUST_HIST ADD CONSTRAINT PK_CUST_HIST PRIMARY KEY (CUST_NO, CHANGE_DT);

CREATE TABLE ORDER_HIST AS
SELECT A.*, B.*
FROM (SELECT i AS ORD_NO
FROM generate_series(1,40000) a(i)
) A,
(SELECT current_date -i AS CHANGE_DT,
CHR(65+MOD(i,5)) AS ORD_STS_CD
FROM generate_series(1,5) b(i)
) B;

ALTER TABLE ORDER_HIST ADD CONSTRAINT PK_ORDER_HIST PRIMARY KEY (ORD_NO, CHANGE_DT);

CREATE TABLE ORDERS AS
SELECT i AS ORD_NO
, MOD(i, 2000)+1 AS CUST_NO
, CURRENT_DATE - MOD(i,5)+1 AS ORD_DT
, CHR(65+MOD(i,5)) AS ORD_STS_CD
FROM generate_series(1,40000) a(i);
ALTER TABLE ORDERS ADD CONSTRAINT PK_ORDERS PRIMARY KEY (ORD_NO);

The query is designed to retrieve the customer's name, customer's grade, and the status of the order on a specific day. Upon seeing the execution plan produced by running the EXPLAIN command, I was puzzled.

Here is the execution plan:

Hash Join (actual time=0.410..294.342 rows=40000 loops=1)
Hash Cond: (y.cust_no = a.cust_no)
Buffers: shared hit=560232
-> Seq Scan on orders y (actual time=0.004..3.333 rows=40000 loops=1)
        Buffers: shared hit=217
-> Hash (actual time=0.374..0.375 rows=2000 loops=1)
        Buckets: 2048 Batches: 1 Memory Usage: 129kB
        Buffers: shared hit=15
        -> Seq Scan on cust a (actual time=0.003..0.156 rows=2000 loops=1)
              Buffers: shared hit=15
SubPlan 1
    -> Limit (actual time=0.001..0.002 rows=1 loops=40000)
          Buffers: shared hit=120000

-> Index Scan Backward using pk_cust_hist on cust_hist b

(actual time=0.001..0.001 rows=1 loops=40000)

                Index Cond: ((cust_no = a.cust_no) AND (change_dt <= '2023-02-20'::date))
                Buffers: shared hit=120000
SubPlan 2
    -> Limit (actual time=0.001..0.001 rows=1 loops=40000)
          Buffers: shared hit=120000

-> Index Scan Backward using pk_cust_hist on cust_hist b_1

(actual time=0.001..0.001 rows=1 loops=40000)

                Index Cond: ((cust_no = a.cust_no) AND (change_dt <= '2023-02-20'::date))
                Buffers: shared hit=120000
SubPlan 3
    -> Limit (actual time=0.001..0.001 rows=1 loops=40000)
          Buffers: shared hit=160000

-> Index Scan Backward using pk_order_hist on order_hist c

(actual time=0.001..0.001 rows=1 loops=40000)

                Index Cond: ((ord_no = y.ord_no) AND (change_dt <= '2023-02-20'::date))
                Buffers: shared hit=160000
SubPlan 4
    -> Limit (actual time=0.001..0.001 rows=1 loops=40000)
          Buffers: shared hit=160000

-> Index Scan Backward using pk_order_hist on order_hist c_1

(actual time=0.001..0.001 rows=1 loops=40000)

                Index Cond: ((ord_no = y.ord_no) AND (change_dt <= '2023-02-20'::date))
                Buffers: shared hit=160000
Planning:
Buffers: shared hit=6
Planning Time: 0.218 ms
Execution Time: 297.194 ms

The following is a quick description of the plan.
1) Internally the optimizer checks for the existence of any rows in the table ORDERS.
   If it doesn't find a single row, it doesn't do anything.
2) The optimizer scans the CUST table and the results are hashed into memory.
3) The optimizer scans the ORDERS table and uses the hash table created from the CUST table to locate the matching rows based on the CUST_NO column.
4) With the intermediate result set, it executes four subplans sequentially (not in parallel).
5) Each subplan uses index scans on the CUST_HIST and ORDER_HIST tables with the condition that the CHANGE_DT column is less than or equal to '2023-02-20'::date. Each subplan is executed 40,000 times, which is equal to the number of resulting rows after joining the ORDERS and CUST tables based on the CUST column. Each subplan uses a backward index scan to find the most recent row that matches the condition.

What surprised me was that there were only two scalar subqueries in the FOO inline view but the query planner produced four subplans. Also, we can not find any trace of the inline view FOO, which means that the optimizer merged the inline view Foo with the outer query. We can deduce that the optimizer transformed the query as follows:

SELECT A.CUST_NO, CUST_NM, ORD_NO

, (SELECT ARRAY[CHANGE_DT::TEXT, CUST_GRDE_CD::TEXT]

FROM CUST_HIST B

WHERE B.CUST_NO = A.CUST_NO

AND B.CHANGE_DT <= '20230220'::DATE

ORDER BY B.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY)[1] AS CUST_CHG_DT

, (SELECT ARRAY[CHANGE_DT::TEXT, CUST_GRDE_CD::TEXT]

FROM CUST_HIST B

WHERE B.CUST_NO = A.CUST_NO

AND B.CHANGE_DT <= '20230220'::DATE

ORDER BY B.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY)[2] AS CUST_GRDE_CD

, (SELECT ARRAY[CHANGE_DT::TEXT, ORD_STS_CD]

FROM ORDER_HIST C

WHERE Y.ORD_NO = C.ORD_NO

AND C.CHANGE_DT <= '20230220'::DATE

ORDER BY C.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY)[3] AS ORD_CHG_DT

, (SELECT ARRAY[CHANGE_DT::TEXT, ORD_STS_CD]

FROM ORDER_HIST C

WHERE Y.ORD_NO = C.ORD_NO

AND C.CHANGE_DT <= '20230220'::DATE

ORDER BY C.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY)[4] AS ORD_STS_CD

FROM CUST A, ORDERS Y
WHERE A.CUST_NO = Y.CUST_NO;

The optimizer has an instinct to merge inline views, which is a good idea in most situations. However, in this case, it was a bad idea. What shoud we do to prevent the optimizer from merging an inline view FOO? The answer is to use the OFFSET clause. I have added the OFFSET 0 clause in the inline view:

SELECT CUST_NO, CUST_NM, ORD_NO, B_ARR[1] AS CUST_CHG_DT

, B_ARR[2] AS CUST_GRDE_CD, C_ARR[1] AS ORD_CHG_DT

, C_ARR[2] AS ORD_STS_CD

FROM (SELECT A.CUST_NO, A.CUST_NM, Y.ORD_NO

, (SELECT ARRAY[CHANGE_DT::TEXT, CUST_GRDE_CD::TEXT]

FROM CUST_HIST B

WHERE B.CUST_NO = A.CUST_NO

AND B.CHANGE_DT <= '20230220'::DATE

ORDER BY B.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY) AS B_ARR

, (SELECT ARRAY[CHANGE_DT::TEXT, ORD_STS_CD]

FROM ORDER_HIST C

WHERE Y.ORD_NO = C.ORD_NO

AND C.CHANGE_DT <= '20230220'::DATE

ORDER BY C.CHANGE_DT DESC

FETCH NEXT 1 ROWS ONLY) AS C_ARR

FROM CUST A, ORDERS Y

WHERE A.CUST_NO = Y.CUST_NO

OFFSET 0 --I have added OFFSET 0

) AS FOO;

This is the execution plan after adding the OFFSET 0 clause.

Subquery Scan on foo (actual time=0.347..172.978 rows=40000 loops=1)

Buffers: shared hit=280232
-> Hash Join (actual time=0.345..161.494 rows=40000 loops=1)
        Hash Cond: (y.cust_no = a.cust_no)
        Buffers: shared hit=280232
        -> Seq Scan on orders y (actual time=0.004..3.406 rows=40000 loops=1)
              Buffers: shared hit=217
        -> Hash (actual time=0.318..0.319 rows=2000 loops=1)
              Buckets: 2048 Batches: 1 Memory Usage: 129kB
              Buffers: shared hit=15
              -> Seq Scan on cust a (actual time=0.003..0.151 rows=2000 loops=1)
                    Buffers: shared hit=15
        SubPlan 1
          -> Limit (actual time=0.002..0.002 rows=1 loops=40000)
                Buffers: shared hit=120000

-> Index Scan Backward using pk_cust_hist on cust_hist b

(actual time=0.001..0.001 rows=1 loops=40000)

                      Index Cond: ((cust_no = a.cust_no) AND (change_dt <= '2023-02-20'::date))
                      Buffers: shared hit=120000
        SubPlan 2
          -> Limit (actual time=0.002..0.002 rows=1 loops=40000)
                Buffers: shared hit=160000

-> Index Scan Backward using pk_order_hist on order_hist c

(actual time=0.001..0.001 rows=1 loops=40000)

Index Cond: ((ord_no = y.ord_no) AND (change_dt <= '2023-02-20'::date))
Buffers: shared hit=160000
Planning:
Buffers: shared hit=6
Planning Time: 0.186 ms
Execution Time: 175.984 ms

I have highlighted the Subquery Scan node in the top part of the plan, which indicates that the inline view FOO was not merged. Note also that there are only two subplans, as expected. By reducing the number of subplans we were able to reduce the number of block IO from 560232 to 280232.

Conclusion
When you use an array in a correlated subquery in the SELECT clause and the SELECT clause is within an inline view, watch whether the inlinvew view is merged. When the inline view is merged, the correlated subquery in the SELECT list can be executed as many times as the number of elements in the array.

Addendum 1
When I first saw the original query, I thought that the developer had a strong background in Oracle because Oracle has a mechanism called "scalar subquery caching".
Each time the correlated subquery is called the session "remembers" the input value (cust_no) and the result (ARRAY[CHANGE_DT::TEXT, CUST_GRDE_CD::TEXT]) of the call in a local cache and if the subquery is called with an input that has previously been used, the session gets the result from this cache rather than actually running the subquery again and, as an extra little boost, does not even check the cache if the cust_no for the current call is the same as the one for the immediately preceding call. This means that some queries can execute much faster than the execution plan would suggest, and that is a good argument for using scalar subquries if you know your data well enough to be certain that the caching feature will work in your favor.
So if we run the rewritten query in Oracle and Oracle generates the same execution plan, Subplan 1 will be executed just 2,000 times, not 40,000 times.

Unfortunately, PostgreSQL does not have such architecture for scalar subqueries.
However, PostgreSQL has a similar mechanism called Memoize when joining two tables using a Nested Loop Join. The Memoize plan node gets chosen in a very particular case where the inner table being joined should use a parameterized index scan. Memoize is used by the query planner to avoid recalculating a particluar expression during the execution of a query. When the planner encounters an expressioin that has already been computed in a previous step, it stores the result of the expression in a memroy called the memoization table. This way, the planner can simply retrieve the result from the memoization table instead of re-evaluating the expression during the execution of the query.
In order to use the Memoize feature in PostgreSQL, I transformed the query as follows:

SELECT A.CUST_NO, A.CUST_NM, Y.ORD_NO, B.CHANGE_DT AS CUST_CHG_DT

, B.CUST_GRDE_CD, C.CHANGE_DT AS ORD_CHG_DT, C.ORD_STS_CD

FROM CUST A
   JOIN LATERAL
        (SELECT CHANGE_DT, CUST_GRDE_CD
           FROM CUST_HIST B
          WHERE A.CUST_NO = B.CUST_NO
             AND B.CHANGE_DT <= '20230220'::DATE
          ORDER BY B.CHANGE_DT DESC
          FETCH NEXT 1 ROWS ONLY
        ) B
    ON TRUE
JOIN ORDERS Y
    ON A.CUST_NO = Y.CUST_NO
JOIN LATERAL
       (SELECT CHANGE_DT, ORD_STS_CD
          FROM ORDER_HIST C
         WHERE Y.ORD_NO = C.ORD_NO
            AND C.CHANGE_DT <= '20230220'::DATE
         ORDER BY C.CHANGE_DT DESC
         FETCH NEXT 1 ROWS ONLY
      ) C
   ON TRUE
;
Here's the resulting execution plan:

Nested Loop (actual time=0.025..100.159 rows=40000 loops=1)
Buffers: shared hit=172217
-> Nested Loop (actual time=0.019..80.327 rows=40000 loops=1)
        Buffers: shared hit=166217
        -> Nested Loop (actual time=0.013..64.422 rows=40000 loops=1)
              Buffers: shared hit=160217
              -> Seq Scan on orders y (actual time=0.004..3.857 rows=40000 loops=1)
                    Buffers: shared hit=217
              -> Limit (actual time=0.001..0.001 rows=1 loops=40000)
                    Buffers: shared hit=160000

-> Index Scan Backward using pk_order_hist on order_hist c

(actual time=0.001..0.001 rows=1 loops=40000)

                          Index Cond: ((ord_no = y.ord_no) AND (change_dt <= '2023-02-20'::date))
                          Buffers: shared hit=160000
        -> Memoize (actual time=0.000..0.000 rows=1 loops=40000)
              Cache Key: y.cust_no
              Cache Mode: logical
              Hits: 38000 Misses: 2000 Evictions: 0 Overflows: 0 Memory Usage: 246kB
              Buffers: shared hit=6000
              -> Index Scan using pk_cust on cust a (actual time=0.001..0.001 rows=1 loops=2000)
                    Index Cond: (cust_no = y.cust_no)
                    Buffers: shared hit=6000
-> Memoize (actual time=0.000..0.000 rows=1 loops=40000)
        Cache Key: a.cust_no
        Cache Mode: binary
        Hits: 38000 Misses: 2000 Evictions: 0 Overflows: 0 Memory Usage: 211kB
        Buffers: shared hit=6000
        -> Limit (actual time=0.001..0.001 rows=1 loops=2000)
              Buffers: shared hit=6000

-> Index Scan Backward using pk_cust_hist on cust_hist b

(actual time=0.001..0.001 rows=1 loops=2000)

                    Index Cond: ((cust_no = a.cust_no) AND (change_dt <= '2023-02-20'::date))
                    Buffers: shared hit=6000
Planning:
Buffers: shared hit=12
Planning Time: 0.339 ms
Execution Time: 101.990 ms

The query is using nested loops to join four tables. The first nested loop joins ORDERS and ORDER_HIST on the ORD_NO column, and applies a limit to only return the most recent record from ORDER_HIST where CHANGE_DT is before or equal to '2023-20-20'::date.
The second nested loop uses memoization to cache results of a previous index scan on the CUST table based on the CUST_NO column. The memoization step is used to
reduce the number of times the index scan on the CUST table needs to be performed, and therfore improve performance. We can observe that the index scan on the PK_CUST index was performed 2,000 times, not 40,000 times.
The final nested loop also uses memoization to join the CUST_HIST table based on the CUST_NO column and applies a limit to only return the most recent record from the CUST_HIST table where CHANGE_DT is before or equal to '2023-02-20'::.

Overall, the number of block IOs dropped from 280,232 to 172,217. Consequently it is safe to say that we should use nested loop joins instead of scalar subqueries in terms of performance as PostgreSQL has a memoize feature.

In general it is recommended to join small tables first. So we need to force the planner to join the CUST and CUST_HIST tables first.
Here is the query I re-engineered to force the join order:

WITH W AS MATERIALIZED (
   SELECT A.CUST_NO, A.CUST_NM, B.CHANGE_DT AS CUST_CHG_DT, B.CUST_GRDE_CD
     FROM CUST A
      JOIN LATERAL
           (SELECT CHANGE_DT, CUST_GRDE_CD
              FROM CUST_HIST B
             WHERE A.CUST_NO = B.CUST_NO
                AND B.CHANGE_DT <= '20230220'::DATE
             ORDER BY B.CHANGE_DT DESC
             FETCH NEXT 1 ROWS ONLY
           ) B
       ON TRUE
)

SELECT W.CUST_NO, W.CUST_NM, Y.ORD_NO, W.CUST_CHG_DT, W.CUST_GRDE_CD

, C.CHANGE_DT, C.ORD_STS_CD

FROM W
   JOIN ORDERS Y
     ON W.CUST_NO = Y.CUST_NO
   JOIN LATERAL
        (SELECT CHANGE_DT, ORD_STS_CD
           FROM ORDER_HIST
          WHERE ORD_NO = Y.ORD_NO
             AND CHANGE_DT <= '20230220'::DATE
          ORDER BY CHANGE_DT DESC
           FETCH NEXT 1 ROWS ONLY
        ) C
    ON TRUE;

I have used the materialized keywork to force PostgreSQL to join the CUST and CUST_HIST tables first in the with clause.
Here is the execution plan I obtained.

Nested Loop (actual time=7.834..84.531 rows=40000 loops=1)
Buffers: shared hit=166232
CTE w
    -> Nested Loop (actual time=0.018..4.126 rows=2000 loops=1)
          Buffers: shared hit=6015
          -> Seq Scan on cust a (actual time=0.006..0.273 rows=2000 loops=1)
                Buffers: shared hit=15
          -> Limit (actual time=0.002..0.002 rows=1 loops=2000)
                Buffers: shared hit=6000
                -> Index Scan Backward using pk_cust_hist on cust_hist b (actual time=0.001..0.001 rows=1 loops=2000)
                      Index Cond: ((cust_no = a.cust_no) AND (change_dt <= '2023-02-20'::date))
                      Buffers: shared hit=6000
-> Hash Join (actual time=7.816..17.753 rows=40000 loops=1)
        Hash Cond: (w.cust_no = y.cust_no)
        Buffers: shared hit=6232
        -> CTE Scan on w (actual time=0.019..4.771 rows=2000 loops=1)
              Buffers: shared hit=6015
        -> Hash (actual time=7.749..7.750 rows=40000 loops=1)
              Buckets: 65536 Batches: 1 Memory Usage: 2075kB
              Buffers: shared hit=217
              -> Seq Scan on orders y (actual time=0.005..3.465 rows=40000 loops=1)
                    Buffers: shared hit=217
-> Limit (actual time=0.001..0.001 rows=1 loops=40000)
        Buffers: shared hit=160000
        -> Index Scan Backward using pk_order_hist on order_hist c (actual time=0.001..0.001 rows=1 loops=40000)
              Index Cond: ((ord_no = y.ord_no) AND (change_dt <= '2023-02-20'::date))
              Buffers: shared hit=160000
Planning Time: 0.164 ms
Execution Time: 86.789 ms

Here is a detailed interpretation of the execution plan:
1) The query uses a CTE named "w" that contains a nested loop join between two tables. The CTE returns 2000 rows.
2) In the CTE the nested loop joins CUST and CUST_HIST based on the CUST_NO column, and applies a limit to only return the most recent row from CUST_HIST where chage_dt is before or equal to '2023-02-20'::date.
3) The outer query uses a hash join to combine the CTE w and the ORDERS table. It builds a hash table with the ORDERS table and then scans the CTE w to locate the matching rows based on the CUST_NO column.
4) Finally, the outer query uses another index scan, this time on the ORDER_HIST table using the PK_ORDER_HIST index, to look up the corresponding row in the ORDER_HIST table for each row in the join result. The index scan returns at most one row per iteration, as limited by the "LIMIT" node.

The plan shows that the majority of the time ( 84-17 = 67 ms out of a total of 86.7 ms) is spent in the final nested loop that joins the ORDER_HIST table. This suggests that repetitive access to the PK_ORDER_HIST index can significantly harm performance.

The sample tables here are small, so a nested loop join is acceptable. What should we do if the tables are huge in size? Then it is recommended to use a hash join.

Here is the rewritten query that uses a hash join:

SELECT A.CUST_NO, A.CUST_NM, Y.ORD_NO, B.CHANGE_DT AS CUST_CHG_DT

, B.CUST_GRDE_CD, C.CHANGE_DT AS ORD_CHG_DT, C.ORD_STS_CD

FROM CUST A
, ORDERS Y

, (SELECT B.*, ROW_NUMBER() OVER (PARTITION BY CUST_NO

ORDER BY CHANGE_DT DESC) AS RN

           FROM CUST_HIST B
          WHERE CHANGE_DT <= '20230220'::DATE
         ) B

, (SELECT C.*, ROW_NUMBER() OVER (PARTITION BY ORD_NO

ORDER BY CHANGE_DT DESC) AS RN

           FROM ORDER_HIST C
          WHERE CHANGE_DT <= '20230220'::DATE
         ) C
WHERE A.CUST_NO = Y.CUST_NO
    AND A.CUST_NO = B.CUST_NO
    AND Y.ORD_NO = C.ORD_NO
    AND B.RN = 1
    AND C.RN = 1
;
Here is the execution plan:

Hash Join (actual time=24.898..121.060 rows=40000 loops=1)
Hash Cond: (c.ord_no = y.ord_no)
Buffers: shared hit=120890
-> Subquery Scan on c (actual time=0.033..85.588 rows=40000 loops=1)
        Filter: (c.rn = 1)
        Buffers: shared hit=120549
        -> WindowAgg (actual time=0.032..82.518 rows=40000 loops=1)
              Run Condition: (row_number() OVER (?) <= 1)
              Buffers: shared hit=120549
              -> Incremental Sort (actual time=0.027..46.943 rows=120000 loops=1)
                    Sort Key: c_1.ord_no, c_1.change_dt DESC
                    Presorted Key: c_1.ord_no
                    Full-sort Groups: 3637 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
                    Buffers: shared hit=120549
                    -> Index Scan using pk_order_hist on order_hist c_1 (actual time=0.009..30.053 rows=120000 loops=1)
                          Index Cond: (change_dt <= '2023-02-20'::date)
                          Buffers: shared hit=120549
-> Hash (actual time=24.859..24.862 rows=40000 loops=1)
        Buckets: 65536 (originally 2048) Batches: 1 (originally 1) Memory Usage: 3317kB
        Buffers: shared hit=341
        -> Hash Join (actual time=8.901..19.664 rows=40000 loops=1)
              Hash Cond: (y.cust_no = a.cust_no)
              Buffers: shared hit=341
              -> Seq Scan on orders y (actual time=0.003..2.684 rows=40000 loops=1)
                    Buffers: shared hit=217
              -> Hash (actual time=8.894..8.896 rows=2000 loops=1)
                    Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 157kB
                    Buffers: shared hit=124
                    -> Hash Join (actual time=5.169..8.662 rows=2000 loops=1)
                          Hash Cond: (b.cust_no = a.cust_no)
                          Buffers: shared hit=124
                          -> Subquery Scan on b (actual time=4.848..7.948 rows=2000 loops=1)
                                Filter: (b.rn = 1)
                                Buffers: shared hit=109
                                -> WindowAgg (actual time=4.847..7.792 rows=2000 loops=1)
                                      Run Condition: (row_number() OVER (?) <= 1)
                                      Buffers: shared hit=109
                                      -> Sort (actual time=4.818..5.520 rows=16000 loops=1)
                                            Sort Key: b_1.cust_no, b_1.change_dt DESC
                                            Sort Method: quicksort Memory: 1260kB
                                            Buffers: shared hit=109
                                            -> Seq Scan on cust_hist b_1 (actual time=0.174..2.033 rows=16000 loops=1)
                                                  Filter: (change_dt <= '2023-02-20'::date)
                                                  Rows Removed by Filter: 4000
                                                  Buffers: shared hit=109
                          -> Hash (actual time=0.318..0.318 rows=2000 loops=1)
                                Buckets: 2048 Batches: 1 Memory Usage: 129kB
                                Buffers: shared hit=15
                                -> Seq Scan on cust a (actual time=0.002..0.150 rows=2000 loops=1)
                                      Buffers: shared hit=15
Planning:
Buffers: shared hit=36
Planning Time: 0.354 ms
Execution Time: 122.479 ms

There are two things that stand out in this execution plan.
1) The query uses hash joins to join four tables.
2) The majority of the time (85 ms out of a total of 121 ms) is spent on scanning the ORDER_HIST table and retrieving the most recent row for each order number.

Further investigation is required to determine the appropriate join method for this business requirement.

Addendum 2

There are two approaches to storing historical data. One is "time period modeling" and the other is "audit modeling".

Using a start date and end date is referred to as time period modeling. This approach is useful when you need to track changes to a record over time and when the duration of each change is important. For example, if you are tracking the employment history of a person, you may want to know the exact start and end dates of each job they have held. This approach can be more intuitive and easier to understand for certain types of data.

Audit modeling, on the other hand, uses a change date to store historical data. This approach is useful when you need to track changes to a record over time but don't need to know the exact duration of each change. For example, if you are tracking changes to a customers address over time, you may only need to know the date on which each change was made. This approach can be more efficient and easier to implement for certain types of data.

Although I had used "time period modeling" frequently in the past, I became cautious about modeling entities to store historical data after reading the following blog post a few years ago.

Science of Database :: 변경이력 테이블에 종료일자가 필요한가? (tistory.com)

In the sample data model above, we are interested in the date on which the status of an order was changed, not the duration of the order status. Therefore, audit modeling is more appropriate than time period modeling.

However, in the last SQL statement we had to use the ROW_NUMBER() function to retrieve the most recent record, which incurred performance degradation. Surely, if we adopt time period modeling when the size of the table of historical data is tremendous, we will be able to retrieve data faster than audit modeling. But we have to check whether there will be requirements of extracting historical data of a specific day. In small tables, such as the example I brought up, audit modeling can work well in terms of query performance.

E-MAIL: root@postgresdba.com