In PostgreSQL 14, two compression algorithms are supported: pglz and lz4. And the default value of the parameter default_toast_compression is still pglz. I can't explain why the default value is still pglz, which is not performant.
In this note I will show you why we should set the parameter default_toast_compression to LZ4.
I will start with a little scripts to demonstrate a point about column compressioin.
I have created two tables. One has a column with the compression algorithm of PGLZ. The other has a column with the compression algorithm of LZ4.
DROP TABLE T2;
CREATE TABLE T2(
ID INT,
C1 TEXT COMPRESSION pglz)
;
DROP TABLE T3;
CREATE TABLE T3(
ID INT,
C1 TEXT COMPRESSION lz4)
;
analdb=# \d+ t2
Table "public.t2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
c1 | text | | | | extended | pglz | |
Access method: heap
analdb=# \d+ t3
Table "public.t3"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
c1 | text | | | | extended | lz4 | |
Access method: heap
I inserted 100,000 rows into those two tables to check the speed of the insert operation.
analdb=# \timing
Timing is on.
analdb=# INSERT INTO T2 SELECT i, repeat(i::varchar||'kkk',1000) FROM generate_series(1,100000) a(i);
INSERT 0 100000
Time: 4158.828 ms (00:04.159)
analdb=# INSERT INTO T3 SELECT i, repeat(i::varchar||'kkk',1000) FROM generate_series(1,100000) a(i);
INSERT 0 100000
Time: 767.550 ms
analdb=# select id, pg_column_compression(c1) from t2 limit 1;
id | pg_column_compression
----+-----------------------
1 | pglz
analdb=# select id, pg_column_compression(c1) from t3 limit 1;
id | pg_column_compression
----+-----------------------
1 | lz4
As you can see, the time spent on data insertion increases significantly when using PGLZ. LZ4 compression only takes about 1/7 of the time spent by PGLZ.
I checked the size of those two tables.
analdb=# select relname, relpages, relpages*8/1024 size_MB, reltuples, reltoastrelid, pg_relation_filepath(reltoastrelid)
analdb-# from pg_class where relname in ('t2', 't3');
relname | relpages | size_mb | reltuples | reltoastrelid | pg_relation_filepath
---------+----------+---------+-----------+---------------+----------------------
t3 | 1126 | 8 | 100000 | 16795 | base/16388/16801
t2 | 1799 | 14 | 100000 | 16790 | base/16388/16804
We can see that the compression ratio of LZ4 is better.
I checked the speed of update operation.
analdb=# update t2
set c1 = repeat('abc',1000);
UPDATE 100000
Time: 1654.965 ms (00:01.655)
analdb=# update t3
set c1 = repeat('abc',1000);
UPDATE 100000
Time: 448.726 ms
We can see that it takes longer to update a column with the compression algorithm PGLZ.
Conclusion
Compared with PGLZ, LZ4 is more efficient. I highly recommend you should set the parameter default_toast_compression to LZ4.