PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 184건, 최근 1 건

Predicate Evaluation Order and Anti-State Forces

글쓴이 : 모델광 날짜 : 2023-11-18 (토) 21:51 조회 : 4019

The following article insipred me to write this note:
Yoon levels 'anti-state' jibe at Moon (koreaherald.com)
After reading the article, it occurred to me that the PostgreSQL Global Development Group core team might be anti-state forces because they put priority on the nothern(upper) part over the southern (lower) part when evaluating predicates in the WHERE clause with an equality operator.

In Oracle, with the Cost-Based Optimzer enabled (enabled by default since version 10g), the database engine evaluates filtering predicates in the order it feels is most efficient. If you are a dinosaur like me, you might remember way back in the day with the rule-based optimizer that predicates were evaluated from the bottom upwards (this is a proof that old Oracle employees were not anti-state foreces), but no such rule exists in the Cost-Based Optimizer. The Oracle optimizer would do some arithmetic to calculate the most cost-effective order of applying the filter predicates based on things like: histograms, the number of CPU cycles it takes to walk along a row to find a particular column, the number of rows that would pass the first filter predicate test and then require the second filter predicate test to take place, and then some.
However, in PostgreSQL, the optimizer applies filter predicates following a rule-based order. Here is some code to demonstrate this concept:

This test was conducted on PostgreSQL 16.0.

CREATE TABLE T1
AS
SELECT MOD(i,20)::TEXT v1
, i as n1
, mod(i, 20) n2
FROM generate_series(1,30000) a(i);

Note that I have created a table, where the column N1 is the most selective.

SELECT *
FROM T1
WHERE V1 = '1'
AND N2 = 1

AND N1 = 21;

Which predicate do you think PostgreSQL should apply first? Considering that N1 is the most selective, it should be evaluated first. Will PostgreSQL be smart enough to prioritize it? Let's examine the execution plan to see how the planner processes the predicates.

Seq Scan on t1 (cost=0.00..688.00 rows=1 width=10) (actual time=0.010..1.670 rows=1 loops=1)
Filter: ((v1 = '1'::text) AND (n2 = 1) AND (n1 = 21))
Rows Removed by Filter: 29999
Buffers: shared hit=163
Planning Time: 0.053 ms
Execution Time: 1.681 ms

The optimizer evaluated the filter predicates in the following sequence from a table of 30,000 rows
1) applied V1='1' to find 1,500 rows
2) from those 1,500 rows, it applied the filter N2=1 to return 1,500 rows
3) then finally from the subset of 1,500 rows, it applied N1=21 to obtain just 1 row

In PostgreSQL, we can help the optimizer to assess the most selective filter predicate early by rewriting the query as follows:

SELECT *
FROM T1
WHERE N1 = 21
AND N2 = 1
AND V1 = '1';

Seq Scan on t1 (cost=0.00..688.00 rows=1 width=10) (actual time=0.011..1.366 rows=1 loops=1)
Filter: ((n1 = 21) AND (n2 = 1) AND (v1 = '1'::text))
Rows Removed by Filter: 29999
Buffers: shared hit=163
Planning Time: 0.053 ms
Execution Time: 1.375 ms

The execution plan indicates that the optimizer:
1) applis N1 = 21 and just gets one row
2) then evaluates N2 = 1 against the row and the row survives
3) then finally assesses V1 = '1' to return that one row

Note that the costs of the two quries are identical; however, the second query consistently runs faster.
In this scenario, the improvement amounts to 18.2% in terms of elapsed time.

In PostgreSQL, the optimizer evaluates filter predicates from the top down when we use the eqality operator.

However, if you do the same experiment on Oracle 19c, the optimizer applies the most selective filter predicate first, regardless of the position of the predicate in the query.
In Oracle it is almost impossible to manually change optimizer's evaluation order because this is done by Oracle engine automatically. The only option I am aware of is the hint, but the behaviour of the hint is hard to predict - there are some predicate transformations that can introduce or eliminate some predicates, such as view merging, unnesting, JPPD and then some.

On the other hand in PostgreSQL, there is a method for taking control of predicate evaluation order when there is an equality operator in the filter predicate.

Conclusion

In Oracle, it does not matter how we write the quries with required predicates in any order becasue the database engine finds its best way to re-arrange them.
However, keep in mind that the PostgreSQL engine was developed by anti-state forces, who favor the upper part over the lower part when evaluating predicates in the WHERE clause with the equality operator.
In PostgreSQL, the filtering predicate order impacts data retrieval speed and query response time because the query's where clause is evaluated in the order it is written when an equality operator is used in the filter predicate.

Footnote
After reading this note, you might begin scrutinzing all the predicates in your application queries with the intention of placing the most selective predicate near the top of the WHERE clause. To be honest I attempted to do that. However, it turned out that I just burnt a lot of man-hours really just doing nothing as there are other factors to consider, such as access path, correlation between colums, histograms and more.
Be pragmatic. It is worthwhile to reorder the predicates when the query is executed very frequently, such as 10 times per second.

Addendum 1
If you use functions in the filter predicate, the optimizer raises its estimated cost and evaluates the predicate last. However, it is odd that the COALESCE function does not raise the estimated cost and has no effect on the order in which the predicate is applied. Here is the test query followd by its execution plan:

SELECT *

FROM T1
WHERE UPPER(V1) = '1' AND N2 = 1 AND N1 = 21;

Seq Scan on t1 (cost=0.00..763.00 rows=1 width=10) (actual time=0.013..1.338 rows=1 loops=1)
Filter: ((n2 = 1) AND (n1 = 21) AND (upper(v1) = '1'::text))
Rows Removed by Filter: 29999
Buffers: shared hit=163
Planning Time: 0.072 ms

Execution Time: 1.349 ms

Note that the estimated cost increased from 688 to 763, and the UPPER(V1) = '1' predicate was evaluated last, resuling in descreased execution time.

Here is the query using the COALESCE function followed by its execution plan:

SELECT *
FROM T1
WHERE 1=1
   AND coalesce(V1,'0') = '1'
-- AND CASE WHEN V1 IS NOT NULL THEN V1 ELSE '0' END = '1'
   --AND CASE WHEN V1 IS NULL THEN '0' ELSE 'V1' END = '1'
   AND N2 = 1

AND N1 = 21;

Seq Scan on t1 (cost=0.00..688.00 rows=1 width=10) (actual time=0.012..1.829 rows=1 loops=1)
Filter: ((COALESCE(v1, '0'::text) = '1'::text) AND (n2 = 1) AND (n1 = 21))
Rows Removed by Filter: 29999
Buffers: shared hit=163
Planning Time: 0.059 ms

Execution Time: 1.838 ms

Note that when we use the COALESCE function, the estimated cost remains the same, and the predicate with COALESCE is evaluated first, even though the elapsed time rose significantly. If you substitute CASE for COALESCE, the PostgreSQL optimizer behaves the same way. In cases like this, if the filter predicate with COALESCE is not selective, it's better to place the predicate in the latter part of the WHERE clause.

Addendum 2
I previously stated that it is almost impossible to manually change the optimizer's evaluation order because Oracle engine handles this automatically. But, a recent video by Connor Mcdonald provided me with some workarounds to change the optimizer's evaluation order in Oracle.
In the scenario I presented, let's suppose that we want to forece the optimizer to evaulate the predicate V1='1' first. Here are several ways to do that.

Here is an Oracle version of the test code on Oracle 19.1:

create table t1
as
select to_char(mod(rownum, 20)) v1
       , rownum                   n1
       , mod(rownum, 20)          n2
   from xmltable('1 to 30000');

SELECT *
FROM T1
WHERE V1 = '1' AND N2 = 1 AND N1 = 21;

---------------------------------------------------------------------------------------------
| Id | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |      1 |        |      1 |00:00:00.01 |      66 |     63 |
|* 1 | TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |      66 |     63 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
1 - filter(("N1"=21 AND "V1"='1' AND "N2"=1))

The Oracle optimizer determines the order in which it applies predicates, and there is no guarantee that the desired order will be followed. Even though we put the predicate V1 = '1' first, the Oracle engine evaluates N1= 21 early due to its selectivity.
We can use the undocumented hint /+ ORDERED_PREDICATES */ to influence
the order of predicate evaluation.

SELECT /+ ORDERED_PREDICATES */ *
FROM T1
WHERE V1 = '1' AND N2 = 1 AND N1 = 21;

------------------------------------------------------------------------------------
| Id | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |      1 |        |      1 |00:00:00.01 |      66 |
|* 1 | TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |      66 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
1 - filter(("V1"='1' AND "N2"=1 AND "N1"=21))

In this simple query, the optimizer followed the hint, so the V1='1' predicate was tested first. However, you should avoid using this hint because it may not be obeyed in many cases.
Another approach involves using an inline-view.

SELECT *
   FROM (SELECT * FROM T1 WHERE V1 = '1' )
WHERE N2 = 1 AND N1 = 21;
---------------------------------------------------------------------------------------------
| Id | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |      1 |        |      1 |00:00:00.01 |      66 |     63 |
|* 1 | TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |      66 |     63 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("N1"=21 AND "V1"='1' AND "N2"=1))

Note that the inline-view was merged, resuling in the same plan as the original query. To prevent the merging of the inline-view, an /+ NO_MERGE */ hint can be added.

SELECT *
   FROM (SELECT /+ NO_MERGE */ * FROM T1 WHERE V1 = '1' )
WHERE N2 = 1 AND N1 = 21;
-------------------------------------------------------------------------------------
| Id | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      66 |
|   1 | VIEW              |      |      1 |      1 |      1 |00:00:00.01 |      66 |
|* 2 |   TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |      66 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("T1"."N1"=21 AND "V1"='1' AND "T1"."N2"=1))

We can observe VIEW operation in the plan, indicating that the optimizer followed the hint. However, the predicates N2=1 and N1=21 were pushed into the inline view. So the N1=21 predicate was tested first due to its selectivity. This Oracle behavior irritated me because I expected the NO_MERGE hint to make the inline view an independent query block, executing the view separately from the rest of the query.
In order to block predicate pushdown, I have set the hidden parameter _optimizer_filter_pushdown to false.

SELECT *
FROM (SELECT /+ NO_MERGE OPT_PARAM('_OPTIMIZER_FILTER_PUSHDOWN', 'FALSE') */
                *
        FROM T1
          WHERE V1 = '1' )
WHERE N2 = 1
AND N1 = 21;

| Id | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      66 |
|* 1 | VIEW              |      |      1 |   1500 |      1 |00:00:00.01 |      66 |
|* 2 |   TABLE ACCESS FULL| T1   |      1 |   1500 |   1500 |00:00:00.01 |      66 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
1 - filter(("N2"=1 AND "N1"=21))
2 - filter("V1"='1')

We can observe that the V1=1 predicate was evaluated first(operation ID=2) and then the N2=1 and N1=21 predicates were tested because they were not pushed into the inline-view.
If you are reluctant to using a hint, you can employ the rownum pseudocolumn. The optimizer can not push the predicate into the inline view because that would change what the rownum result would be. Because of the rownum presence, Oracle must run the inline view first.

SELECT V1, N1, N2
   FROM (SELECT V1, N1, N2, ROWNUM FROM T1 WHERE V1 = '1' )
WHERE N2 = 1
   AND N1 = 21;

| Id | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      67 |
|* 1 | VIEW               |      |      1 |   1500 |      1 |00:00:00.01 |      67 |
|   2 |   COUNT             |      |      1 |        |   1500 |00:00:00.01 |      67 |
|* 3 |    TABLE ACCESS FULL| T1   |      1 |   1500 |   1500 |00:00:00.01 |      67 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
1 - filter(("N1"=21 AND "N2"=1))
3 - filter("V1"='1')

Note that V1='1' was tested first.

Alternatively, we can use the CASE/END expression, which was recommended
by Connor Mcdonald to over-ride the optimizer's choice of which predicates

should be evaluated first.

SELECT *
FROM T1
WHERE CASE WHEN V1 = '1' AND N2 = 1 AND N1 = 21 THEN 1 END = 1;

------------------------------------------------------------------------------------
| Id | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |      1 |        |      1 |00:00:00.01 |      66 |
|* 1 | TABLE ACCESS FULL| T1   |      1 |    300 |      1 |00:00:00.01 |      66 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
1 - filter(CASE WHEN ("V1"='1' AND "N2"=1 AND "N1"=21) THEN 1 END =1)

Note that Oracle tested the predicates one by one in the order they were written.

Finally, there is another workaround. I have placed the least selective predicate into a WITH subquery and materialized the subquery with the highly memorable MATERIALIZE hint, expecting the V1='1' predicate to be applied to an internal global temporary table that holds only the least selective predicate.

WITH W AS (
SELECT /+ MATERIALIZE */ *
   FROM T1
WHERE V1 = '1' )
SELECT *
   FROM W
WHERE N2 = 1
   AND N1 = 21;

--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                           |      1 |        |      1 |00:00:00.01 |      67 |
|   1 | TEMP TABLE TRANSFORMATION               |                           |      1 |        |      1 |00:00:00.01 |      67 |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6915_DF6372 |      1 |        |      0 |00:00:00.01 |      66 |
|* 3 |    TABLE ACCESS FULL                     | T1                        |      1 |   1500 |   1500 |00:00:00.01 |      65 |
|* 4 |   VIEW                                   |                           |      1 |   1500 |      1 |00:00:00.01 |       0 |
|   5 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6915_DF6372 |      1 |   1500 |   1500 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("V1"='1')
4 - filter(("N2"=1 AND "N1"=21))

In Oracle version 19.1, the N2=1 and N1=21 predicates in the main query block were not pushed into the CTE, allowing the V1 = '1' predicate to be assessed first. However, in version 19.21 and above, an enhancement to the optimzer allows Oracle to minimize the size of materialized CTEs by pushing filter predicates from the main query blocks into the CTE clause. Consequently, we cannot use materialized CTEs to block predicate pushdown in recent versions.

E-MAIL: root@postgresdba.com