analdb=# SELECT SCHEMANAME, COUNT(*) FROM T1 GROUP BY ROLLUP (SCHEMANAME) ORDER BY SCHEMANAME;
schemaname | count
--------------------+-------
information_schema | 21
pg_catalog | 385
| 406
analdb=# SELECT SCHEMANAME, COUNT(*) FROM T2 GROUP BY ROLLUP (SCHEMANAME) ORDER BY SCHEMANAME;
schemaname | count
--------------------+-------
information_schema | 2100
pg_catalog | 38500
| 40600
And Here is the query that I am going to use to demonstrate the peformance issue of functions, followed by their execution plans
SET MAX_PARALLEL_WORKERS_PER_GATHER TO 0;
SELECT COUNT(*)
FROM T1 JOIN T2
ON T1.SCHEMANAME = T2.SCHEMANAME
AND T1.TABLENAME > T2.TABLENAME
;
Aggregate (actual time=1266.545..1266.547 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=1395
-> Hash Join (actual time=0.517..997.889 rows=7086000 loops=1)
Hash Cond: (t2.schemaname = t1.schemaname)
Join Filter: (t1.tablename > t2.tablename)
Rows Removed by Join Filter: 7780600
Buffers: shared hit=1395
-> Seq Scan on public.t2 (actual time=0.005..3.649 rows=40600 loops=1)
Output: t2.schemaname, t2.tablename, t2.attname, t2.most_common_freqs, t2.i
Buffers: shared hit=1381
-> Hash (actual time=0.092..0.093 rows=406 loops=1)
Output: t1.schemaname, t1.tablename
Buckets: 1024 Batches: 1 Memory Usage: 72kB
Buffers: shared hit=14
-> Seq Scan on public.t1 (actual time=0.005..0.047 rows=406 loops=1)
Output: t1.schemaname, t1.tablename
Buffers: shared hit=14
Planning Time: 0.062 ms
Execution Time: 1266.568 ms
Here is how to read the above execution plan:
1) The Aggregate node calls the Hash Join node.
2) The Hash Join node call the Hash node.
3) The Hash node calls the Seq Scan on public.t1 node.
4) The Seq Scan on public.t1 node fetches the values of t1.schemaname and t1.tablename and sends them to the Hash node.
5) The Hash node arranges the values of t1.schemaname
and t1.tablename into a hash table. A hash table stores data as hash
key and value pairs. A hash function distributes hash keys randomly and
evenly across 1024 buckets. A hash function is applied to the join
attribute (t1.schemaname). In this case, the allocated memory was
enough: the hash table took up about 72kB, which allows the join to
execute in one pass (Batches).
5) After the hash table is built, the Hash Join node calls the Seq Scan on public.t2.
6) The Seq Scan node retrieves the values of t2.schemaname, t2.tablename, t2.attname, t2.most_common_freqs, t2.i and sends them to the Hash Join node.
7)
For each fetched row from the Seq Scan, the hash table is probed for
the matching rows, and a hash value is calculated for the join attribute
(t2.schemaname). The Hash Join node then filters out 7780600 rows by
appling the condition t1.tablename > t2.tablename.
9) The Aggregate node receives the resulting set from the Hash Join node and aggregates the resulting set.
Comparing
the actual times, the time recorded at the Hash Join operation was
around 996 mili seconds, of which only a tiny amount was in the feeding
Seq Scan(3.887 ms).
Now let's suppose that you have to use a function for some reason in the filiter predicate.
SELECT COUNT(*)
FROM T1 JOIN T2
ON T1.SCHEMANAME = T2.SCHEMANAME
AND LOWER(T1.TABLENAME) > LOWER(T2.TABLENAME);
Aggregate (actual time=3437.051..3437.053 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=1395
-> Hash Join (actual time=1.433..3161.741 rows=7086000 loops=1)
Hash Cond: (t2.schemaname = t1.schemaname)
Join Filter: (lower((t1.tablename)::text) > lower((t2.tablename)::text))
Rows Removed by Join Filter: 7780600
Buffers: shared hit=1395
-> Seq Scan on public.t2 (actual time=0.005..4.850 rows=40600 loops=1)
Output: t2.schemaname, t2.tablename, t2.attname, t2.most_common_freqs, t2.i
Buffers: shared hit=1381
-> Hash (actual time=0.094..0.095 rows=406 loops=1)
Output: t1.schemaname, t1.tablename
Buckets: 1024 Batches: 1 Memory Usage: 72kB
Buffers: shared hit=14
-> Seq Scan on public.t1 (actual time=0.004..0.047 rows=406 loops=1)
Output: t1.schemaname, t1.tablename
Buffers: shared hit=14
Planning Time: 0.063 ms
Execution Time: 3437.076
There are three things to note from the plan.
The
first is that the built-in function LOWER() consumes a lot of
resources. Even though the number of block IO is identical to that of
the previous query, the elapsed time rose from 1266 to 3437 ms.
The second is that the hash join operated on only equality. You can confirm this by looking at Hash Cond: (t2.schemaname = t1.schemaname) in the plan. The clause lower((t1.tablename)::text) > lower((t2.tablename)::text) is in the Join Filter predicate.
The
third point to note is that the columns SCHEMANAME and TABLENAME of
the table t1 are passed up to the Hash Join, which tell us that the hash
join operation has to run the LOWER function. The big questioin is - how many times does the database engine execute the LOWER() function.
My first guess is that since the table t1 has 406 rows and the table t2
has 40600 rows the Lower() function has to be run 41006 (406 + 40600)
times at most. In the subsequent sections I will show you how to measure
the number of function calls in PostgreSQL.
How to count the number of function calls
If
we assume that the lower() built-in function behaves in the same way as
a PL/PgSQL function we can at least count the number of calls that take
place by writing a PL/PgSQL function like this:
CREATE OR REPLACE FUNCTION F_LOWER(P1 VARCHAR)
RETURNS VARCHAR
LANGUAGE PLPGSQL IMMUTABLE
AS $$
DECLARE
V1 VARCHAR;
BEGIN
SELECT LOWER(P1) INTO V1;
RETURN V1;
END;
$$
SET track_functions TO 'all';
You replace the lower() function with the f_lower() fuction and run the query.
SELECT COUNT(*)
FROM T1 JOIN T2
ON T1.SCHEMANAME = T2.SCHEMANAME
AND F_LOWER(T1.TABLENAME::VARCHAR) > F_LOWER(T2.TABLENAME::VARCHAR);
Here is the execution plan when I used the F_LOWER() function:
Aggregate (actual time=45788.190..45788.191 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=1409
-> Hash Join (actual time=21.168..45228.172 rows=7086000 loops=1)
Hash Cond: (t2.schemaname = t1.schemaname)
Join Filter: ((f_lower((t1.tablename)::character varying))::text
> (f_lower((t2.tablename)::character varying))::text)
Rows Removed by Join Filter: 7780600
Buffers: shared hit=1409
-> Seq Scan on public.t2 (actual time=0.005..17.253 rows=40600 loops=1)
Output: t2.schemaname, t2.tablename, t2.attname, t2.most_common_freqs, t2.i
Buffers: shared hit=1381
-> Hash (actual time=0.110..0.110 rows=406 loops=1)
Output: t1.schemaname, t1.tablename
Buckets: 1024 Batches: 1 Memory Usage: 72kB
Buffers: shared hit=14
-> Seq Scan on public.t1 (actual time=0.007..0.063 rows=406 loops=1)
Output: t1.schemaname, t1.tablename
Buffers: shared hit=14
Planning:
Buffers: shared hit=107
Planning Time: 0.183 ms
Execution Time: 45789.127 ms
The
timings are not really important. Most of the increased time (from 3437
to 45789 ms) is probably building and tearing down the PL/PgSQL
stack. Apart from the change of a function name, the plan is the same.
And then I checked the number of function calls.
SELECT * FROM PG_STAT_USER_FUNCTIONS;
analdb=# SELECT * FROM PG_STAT_USER_FUNCTIONS;
funcid | schemaname | funcname | calls | total_time | self_time
--------+------------+----------+----------+------------+------------
24646 | public | f_lower | 29733200 | 45360.226 | 45360.226
When
I ran the SQL statement with the F_LOWER() function, the function was
called over 29 million times. PostgreSQL did nothing to mitigate the
number of times the function was called. So we can conclude that when we
used the built-in function LOWER() in the query, the LOWER() function
was called over 29 million times. It is a shame that PostgreSQL does not
optimize the query and reduce the number of function calls to 41006 (406 + 40600).
Limiting the function call damage
So assuming that we can not remove the LOWER() function, how can we optimize this query?
SELECT COUNT(*)
FROM T1 JOIN T2
ON T1.SCHEMANAME = T2.SCHEMANAME
AND LOWER(T1.TABLENAME) > LOWER(T2.TABLENAME);
In the previous sections, we noted that the elapsed time without the LOWER() function was 1266.568 ms and the elapsed time with the function was 3437.076
ms. In the above query the function was executed over 29 million times.
So the crutial thing in optimizing the query is that we have to find a way to reduce the number of function calls.
I
made some tweaks to the SQL statement, which forced to materialize the
LOWER() function in the CTE. The cost of physically creating the two
large CTEs might well be much less than the cost of millions of fuction
calls.
WITH W1 AS MATERIALIZED (
SELECT T1.SCHEMANAME, LOWER(T1.TABLENAME) AS TABLENAME, T1.ATTNAME,T1.MOST_COMMON_FREQS FROM T1)
, W2 AS MATERIALIZED (
SELECT T2.SCHEMANAME, LOWER(T2.TABLENAME) AS TABLENAME, T2.ATTNAME,T2.MOST_COMMON_FREQS FROM T2)
SELECT COUNT(*)
FROM W1 JOIN W2
ON W1.SCHEMANAME = W2.SCHEMANAME
AND W1.TABLENAME > W2.TABLENAME;
Here is the EXPLAIN command output of the above query.
Aggregate (actual time=1520.888..1520.890 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=1395, temp written=964
CTE w1
-> Seq Scan on public.t1 (actual time=0.003..0.087 rows=406 loops=1)
Output: t1.schemaname, lower((t1.tablename)::text), t1.attname, t1.most_common_freqs
Buffers: shared hit=14
CTE w2
-> Seq Scan on public.t2 (actual time=0.008..11.459 rows=40600 loops=1)
Output: t2.schemaname, lower((t2.tablename)::text), t2.attname, t2.most_common_freqs
Buffers: shared hit=1381
-> Hash Join (actual time=0.729..1261.361 rows=7086000 loops=1)
Hash Cond: (w2.schemaname = w1.schemaname)
Join Filter: (w1.tablename > w2.tablename)
Rows Removed by Join Filter: 7780600
Buffers: shared hit=1395, temp written=964
-> CTE Scan on w2 (actual time=0.010..41.763 rows=40600 loops=1)
Output: w2.schemaname, w2.tablename, w2.attname, w2.most_common_freqs
Buffers: shared hit=1381, temp written=964
-> Hash (actual time=0.230..0.231 rows=406 loops=1)
Output: w1.schemaname, w1.tablename
Buckets: 1024 Batches: 1 Memory Usage: 51kB
Buffers: shared hit=14
-> CTE Scan on w1 (actual time=0.004..0.185 rows=406 loops=1)
Output: w1.schemaname, w1.tablename
Buffers: shared hit=14
Planning Time: 0.074 ms
Execution Time: 1523.029 ms
AS
you can see from the execution plan PostgreSQL has called the LOWER()
function in the CTE, which means that the fuction was called 406
times in the CTE w1 and 40600 times in the CTE w2. Even though disk I/O
(temp written=964) took place, the elapsed time decreased from 3437 to 1523 ms.
Conclusion
I
don't think that anything I have done or described in this article
could be called rocket science; but it has shown you how much insight
you can gain into what PostgreSQL is doing and how you may be albe to
pinpoint excess work using the track_functions parameter and
PG_STAT_USER_FUNCTIONS view. In the end you will be able to understand
why some DBAs prohibit developers from creating functions and procedures
in the database server.
Footnote
When
you create a function or a procedure, you are likely to include tables
in the function. And whenever the function is called, some rows in the
table referenced by the function have to be loaded into the shared
buffer cache, which increases the possibility of incurring LWLock:buffer content lock as well.
Added on October 31, 2022
Today I ran into a video where two PostgreSQL guys talked about implementing business logic on the database. You will be able to get other DBAs' view on functions and procedures.
Here is the URL of the video:
https://www.youtube.com/watch?v=PcnlrvzKi4Q&list=PLH8y1BNPAKjJM3iuEg9L_WxNVkpP30rsO&index=2