설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 167건, 최근 0 건
   

use_remote_estimate

글쓴이 : 모델광 날짜 : 2023-10-14 (토) 11:37 조회 : 626
Recently, I was requested to optimize a query which had been slow on Oracle 12.2. The query in Oracle was joining a table on a remote database, and when it was migrated to PostgreSQL, it was still running slowly. When I examined the execution plan in PostgreSQL, it appeared that the optimizer did not consider table statistics on a remote table in distributed joins. I then spent an hour trying to find a way to improve the query performance and I finally worked out a solution.

In this note, I will show you how we can make a query run faster when it involves joining a table on a remote database.

First, the SQL to create a couple of tables, indexes, a foreign server, a user mapping, and a foreign table:
I have run this demo on PostgreSQL 15.1.


drop table t1;
create unlogged table t1
as
select i as col1,
       i as col2,
       lpad(i::text,10,'0') col3,
       lpad('x',100,'x')  padding
  from generate_series(1,1000000) a(i);

create unlogged table t2 as select * from t1;

alter table t1 add constraint t1_pk primary key(col1);
alter table t2 add constraint t2_pk primary key(col1);

create unique index t2_fbi on t2((col1+1));

analyze t1 ,t2;

create extension postgres_fdw;

create server loopback
foreign data wrapper postgres_fdw
options (host '192.168.189.128', port '5432', dbname 'analdb');

create user mapping
for scott
server loopback
options ( user 'scott', password 'tiger' );

create foreign table t2_loopback (
col1      int,
col2      int,
col3      text,
padding  text
) server loopback
  options (schema_name 'scott', table_name 't2');
;
The code is quite simple, it creates a couple of identical tables with a col1 column that will produce an index with a very good correlation. You will notice that I have created a function-based index on the col1 column of the T2 table. Later, I will explain why PostgreSQL performs better than Oracle when there is a function-based index on a remote table. You will also notice that I have created a foreign server that is (in my case) a loopback to the current database, a user mapping and a foreign table.

So now I'm going to execute a query that joins t1 and t2 - but I have two versions of the query, one which treats t2 as the local table it really is, and one that pretends (through the loopback) that t2 is remote.


set max_parallel_workers_per_gather=0;
--local table join
select   t1.col3, t2.col3
 from    t1,   t2
where   t2.col1+1 = t1.col1
    and     t1.col2 between 101 and 110;
--remote table join
select    t1.col3, t2.col3
from      t1,   t2_loopback t2
where    t2.col1+1 = t1.col1
    and     t1.col2 between 101 and 110;

Here are the two execution plans obtained by running the EXPLAIN command:

Nested Loop  (cost=0.42..33190.32 rows=1 width=22) (actual time=0.065..54.110 rows=10 loops=1)
  Buffers: shared hit=14207 read=4015
  ->  Seq Scan on t1  (cost=0.00..33181.88 rows=1 width=15) (actual time=0.044..54.077 rows=10 loops=1)
        Filter: ((col2 >= 101) AND (col2 <= 110))
        Rows Removed by Filter: 999990
        Buffers: shared hit=14167 read=4015
  ->  Index Scan using t2_fbi on t2  (cost=0.42..8.44 rows=1 width=15) (actual time=0.001..0.001 rows=1 loops=10)
        Index Cond: ((col1 + 1) = t1.col1)
        Buffers: shared hit=40
Planning:
  Buffers: shared hit=16
Planning Time: 0.173 ms
Execution Time: 54.131 ms

Nested Loop  (cost=100.43..10412.28 rows=1 width=43) (actual time=0.509..2511.475 rows=10 loops=1)
  Buffers: shared hit=3979074 read=20925
  ->  Foreign Scan on t2_loopback t2  (cost=100.00..150.95 rows=1365 width=36) (actual time=0.365..1312.136 rows=1000000 loops=1)
  ->  Index Scan using t1_pk on t1  (cost=0.43..7.51 rows=1 width=15) (actual time=0.001..0.001 rows=0 loops=1000000)
        Index Cond: (col1 = (t2.col1 + 1))
        Filter: ((col2 >= 101) AND (col2 <= 110))
        Rows Removed by Filter: 1
        Buffers: shared hit=3979074 read=20925
Planning Time: 0.105 ms
Execution Time: 2511.714 ms

Both plans show that the optimizer has incorrectly estimated the number of rows that would be retrieved from t1. The actual number of rows returned from t1 was 10, but the planner estimated it would be 1. As an aside, we can enhance the optimizer's estimate by increasing the statistics target (default : 100). While the fully local query does a nested loop join using the very efficient function-based index, the distributed second query seems to have no idea about the remote function-based index and selects all the required rows from the remote table, accessing the t1_pk index 1,000,000 times.

So the significant question is: How will we be able to improve the second query?
I have added a hint to tweak the execution plan of the distributed query.


/+ Leading((t1 t2)) NestLoop(t1 t2) IndexScan(t2 t2_fbi) */
select    t1.col3, t2.col3
from      t1,   t2_loopback t2
where    t2.col1+1 = t1.col1
    and     t1.col2 between 101 and 110;

The hint set I supplied was one that said:
  - consider only the join order t1 -> t2
  - use a nested loop to get to t2
  - use the function-based index t2_fbi to access t2

Here is the resulting execution plan:


Nested Loop  (cost=100.00..33353.42 rows=1 width=43) (actual time=0.325..10120.274 rows=10 loops=1)
  Output: t1.col3, t2.col3
  Join Filter: (t1.col1 = (t2.col1 + 1))
  Rows Removed by Join Filter: 9999990
  Buffers: shared hit=14157 read=4025
  ->  Seq Scan on scott.t1  (cost=0.00..33182.00 rows=1 width=15) (actual time=0.028..53.975 rows=10 loops=1)
        Output: t1.col1, t1.col2, t1.col3, t1.padding
        Filter: ((t1.col2 >= 101) AND (t1.col2 <= 110))
        Rows Removed by Filter: 999990
        Buffers: shared hit=14157 read=4025
  ->  Foreign Scan on scott.t2_loopback t2  (cost=100.00..150.95 rows=1365 width=36) (actual time=0.113..957.314 rows=1000000 loops=10)
        Output: t2.col1, t2.col2, t2.col3, t2.padding
        Remote SQL: SELECT col1, col3 FROM scott.t2
Query Identifier: 7459900436296013947
Planning Time: 0.186 ms
Execution Time: 10120.638 ms

There are several details in the execution plan that tell us that the local planner doesn't know anything about the remote table.

From the top town:
  - the local optimizer obeyed the hint Leading((t1 t2)) and NestLoop(t1 t2), but did not follow the hint IndexScan(t2 t2_fbi).
  - the local optimizer estimated that it would get 1365 rows for each nested loop, but it actually obtained 1,000,000 rows for each loop.
  - the local optimizer decomposed the query to produce a remote query, 'SELECT col1, col3 FROM scott.t2'.
  - the local optimizer has no idea of how many block I/Os occurred in the remote database.

Anyhow, my hinting strategy turned out to be forlorn. In fact, I was stuck with this query, and after carefully reading the manual, I came across the use_remote_estimate option.

https://www.postgresql.org/docs/current/postgres-fdw.html

So I recreated the foreign table with the "use_remote_estimate" set to TRUE.

drop foreign table t2_loopback;
create foreign table t2_loopback (
col1 int,
col2 int,
col3 text,
padding text
) server loopback
  options (schema_name 'scott', table_name 't2', use_remote_estimate 'true');
;
When I reran the distributed query, I obtained the following exeuction plan:

select   t1.col3, t2.col3
  from   t1, t2_loopback t2
where   t2.col1+1 = t1.col1
    and   t1.col2 between 101 and 110;

Nested Loop  (cost=100.43..33290.36 rows=1 width=43) (actual time=0.331..59.613 rows=10 loops=1)
  Output: t1.col3, t2.col3
  Buffers: shared hit=13950 read=4232
  ->  Seq Scan on scott.t1  (cost=0.00..33181.88 rows=1 width=15) (actual time=0.066..57.357 rows=10 loops=1)
        Output: t1.col1, t1.col2, t1.col3, t1.padding
        Filter: ((t1.col2 >= 101) AND (t1.col2 <= 110))
        Rows Removed by Filter: 999990
        Buffers: shared hit=13950 read=4232
  ->  Foreign Scan on scott.t2_loopback t2  (cost=100.43..108.47 rows=1 width=15) (actual time=0.171..0.171 rows=1 loops=10)
        Output: t2.col1, t2.col2, t2.col3, t2.padding
        Remote SQL: SELECT col1, col3 FROM scott.t2 WHERE (($1::integer = (col1 + 1)))
Query Identifier: -8918511877818738218
Planning Time: 1.145 ms
Execution Time: 60.156 ms

You will notice that the local planner estimated that it would retrieve 1 row for each nested loop to t2 and it actually got 1 row for each loop. Note that the planning time increased about 10 times, but the elapsed time decreased significantly from 2517 ms to 60 ms. We can not observe the t2_fbi index access in the plan though, we can infer that the optimizer in the remote server must have accessed the index when we look at the remote SQL "SELECT col1, col3 FROM scott.t2 WHERE (($1::integer = (col1 + 1)))".

In passing, I could not improve the query on Oracle 12c because Oracle did not have the mechanism to recognize table statistics on a remote database.


​Conclusion
  - When we use a foreign data wrapper, the optimzer decompses the query to produce a remote SQL statement that will be sent to the remote database. And table statistics about the remote table could result in a significant change in the cardinality calculations which could then make a dramatic difference to the final remote SQL statement.
  - When a query uses a join to a table on a remote database, we can help the optimizer produce an efficient execution plan using the use_remote_estimate option. Hower we should keep in mind that if the query is 'hot' or run many times, it can have a detrimental effect on remote database performance because it is the same as running the EXPLAIN command on a remote database for each run.
  - When we use the foreign data wrapper in PostgreSQL 15.1, we can not forward a hint to a remote database.

Footnote
The PostgreSQL manual says that running ANALYZE on the foreign table is the way to update the local statistics. So I gathered statistics on the foreign table and conducted the experiment, but it did not help make the query run faster.

   

postgresdba.com