In the previous article I
have taken advantage of the technique of extracting distinct values from
an indexed column efficiently. The technique is so commonly used that
many blog posts have introduced it. According to Jonathan Lewis, Sayan
Malakshnov,a Russian, was one of those suspected developers who first
found the strategy.
http://orasql.org/2012/09/21/distinct-values-by-index-topn/
In this note, I will explain how to find all the distinct combinations for the first two columns of a very large index in PostgreSQL without having to scan and aggregate the whole index or table. The
solution in this note was worked out by Jonathan Lewis. I've tried to
come up with a better solution than his, but I couldn't.
Let's start with a suitable table and index.
DROP TABLE T1;
CREATE TABLE T1
AS
SELECT i id
, mod(i-1,3) val1
, mod(i-1,10) val2
, lpad('k',100,'k') dummy
FROM generate_series(1,1e6) a(i)
ORDER BY random();
ALTER TABLE T1 ADD CONSTRAINT T1_PK PRIMARY KEY(val1, val2, id);
ANALYZE T1;
SELECT relname, relpages, relpages*8/1024 as size_MB
FROM PG_CLASS
WHERE RELNAME LIKE 't1%';
+---------+----------+---------+
| relname | relpages | size_MBytes |
+---------+----------+---------+
| t1 | 18182 | 142 |
| t1_pk | 4511 | 35 |
+---------+----------+---------+
I have created a table with 3 values for val1, 10 values for val2, with a total of 30 combinations. The addition of the primary key
starting with (val1, val2) is just a lazy way to ensure that I have a
suitable index and val1 and val2 are both declared not null.
If the business requirement is to retrieve the distinct values for (val1, val2), you would have used a query like below:
SELECT DISTINCT VAL1, VAL2
FROM T1;
The following is the execution plan the optimizer produced.
HashAggregate (actual time=437.168..437.172 rows=30 loops=1)
Group Key: val1, val2
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=11821 read=6361
-> Seq Scan on t1 (actual time=0.019..124.505 rows=1000000 loops=1)
Buffers: shared hit=11821 read=6361
Planning:
Buffers: shared hit=122 read=31
Planning Time: 3.880 ms
Execution Time: 437.502 ms
To our disappointment, PostgreSQL didn't access the index we have. It visited 11821 memory blocks and 6361 disk blocks.
Another query that meets the business requirement would be like below:
SELECT VAL1, VAL2
FROM T1
GROUP BY VAL1, VAL2;
Group (actual time=151.274..155.047 rows=30 loops=1)
Group Key: val1, val2
Buffers: shared hit=11867 read=6329
-> Gather Merge (actual time=151.273..155.006 rows=90 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=11867 read=6329
-> Sort (actual time=144.965..144.967 rows=30 loops=3)
Sort Key: val1, val2
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=11867 read=6329
Worker 0: Sort Method: quicksort Memory: 26kB
Worker 1: Sort Method: quicksort Memory: 26kB
-> Partial HashAggregate (actual time=144.872..144.876 rows=30 loops=3)
Group Key: val1, val2
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=11853 read=6329
Worker 0: Batches: 1 Memory Usage: 24kB
Worker 1: Batches: 1 Memory Usage: 24kB
-> Parallel Seq Scan on t1 (actual time=0.038..31.946 rows=333333 loops=3)
Buffers: shared hit=11853 read=6329
Planning:
Buffers: shared hit=17 read=3 dirtied=1
Planning Time: 0.547 ms
Execution Time: 155.195 ms
The elapsed time dropped due to parallel processing. But it is still not accessing the index.
Now it is time to take advantage of the technique of getting distinct values using the Common Table Expression.
My first step will be to demonstrate the recursive CTE to get the distinct values for val1 using the index probe. I have included its execution plan to show that this does a minimal amount of work.
WITH RECURSIVE w1 AS (
(SELECT val1, val2 FROM t1 ORDER BY val1, val2 LIMIT 1) --non-recursive term
UNION ALL
SELECT t1.val1, t1.val2 --recursive term
FROM w1
JOIN LATERAL (SELECT val1, val2
FROM t1
WHERE t1.val1 > w1.val1
ORDER BY t1.val1, t1.val1
LIMIT 1) t1 ON TRUE
WHERE w1.val1 IS NOT NULL
AND w1.val2 IS NOT NULL
)
SELECT * FROM w1
WHERE w1.val1 IS NOT NULL
AND w1.val2 IS NOT NULL;
CTE Scan on w1 (actual time=0.181..0.411 rows=3 loops=1)
Filter: ((val1 IS NOT NULL) AND (val2 IS NOT NULL))
Buffers: shared hit=8 read=6
CTE w1
-> Recursive Union (actual time=0.178..0.407 rows=3 loops=1)
Buffers: shared hit=8 read=6
-> Limit (actual time=0.176..0.177 rows=1 loops=1)
Buffers: shared hit=1 read=3
-> Index Only Scan using t1_pk on t1 (actual time=0.174..0.174 rows=1 loops=1)
Heap Fetches: 0
Buffers: shared hit=1 read=3
-> Nested Loop (actual time=0.075..0.075 rows=1 loops=3)
Buffers: shared hit=7 read=3
-> WorkTable Scan on w1 w1_1 (actual time=0.001..0.001 rows=1 loops=3)
Filter: ((val1 IS NOT NULL) AND (val2 IS NOT NULL))
-> Limit (actual time=0.072..0.072 rows=1 loops=3)
Buffers: shared hit=7 read=3
-> Index Only Scan using t1_pk on t1 t1_1 (actual time=0.064..0.064 rows=1 loops=3
Index Cond: (val1 > w1_1.val1)
Heap Fetches: 0
Buffers: shared hit=7 read=3
Planning:
Buffers: shared hit=3 read=4
Planning Time: 0.863 ms
Execution Time: 0.639 ms
At first sight we can spot that we've got an Index Only Scan as the first step of the recursive query, visiting just 1 buffered block and three disk blocks. The Limit in
the plan indicates that it stopped reading the index after getting a
single row. So PostgreSQL is clearly doing an efficient access for that
value. We then see Nested Loop operation executed three times. By using
LATERAL JOIN the inline view after the LATERAL is run for every w1.val1 before the LATERAL. The inline view is also accessed via the t1_pk index.
So we can get the val1 values very easily and efficiently with this recursive CTE technology.
Using the same technology we can write some code to find the val2 values for each possible val1 value in turn:
WITH RECURSIVE w1 AS (
(SELECT val1, val2 FROM t1 ORDER BY val1, val2 LIMIT 1)
UNION ALL
SELECT t1.val1, t1.val2
FROM w1
JOIN LATERAL (SELECT val1, val2
FROM t1
WHERE t1.val1 > w1.val1
ORDER BY t1.val1, t1.val2
LIMIT 1) t1 ON TRUE
WHERE w1.val1 IS NOT NULL
),
w2 AS (
SELECT val1, val2 FROM w1
UNION ALL
SELECT w2.val1, (SELECT t1.val2
FROM t1
WHERE t1.val1 = w2.val1
AND t1.val2 > w2.val2
ORDER BY t1.val1, t1.val2
LIMIT 1) val2
FROM w2
WHERE w2.val2 is not null
AND w2.val1 is not null
)
SELECT *
FROM w2
WHERE w2.val2 is not null
;
In
this query the second CTE (w2) looks similar to the first CTE (w1) -
except I now have included an equality predicate against val1 based on the first of the two columns. The second CTE gets distinct val2 values for each val1 value. The following is the execution plan.
CTE Scan on w2 (actual time=0.060..3.028 rows=30 loops=1)
Filter: (val2 IS NOT NULL)
Rows Removed by Filter: 3
Buffers: shared hit=84 read=22
CTE w1
-> Recursive Union (actual time=0.058..0.258 rows=3 loops=1)
Buffers: shared hit=14
-> Limit (actual time=0.045..0.046 rows=1 loops=1)
Buffers: shared hit=4
-> Index Only Scan using t1_pk on t1 (actual time=0.044..0.044 rows=1 loops=1)
Heap Fetches: 0
Buffers: shared hit=4
-> Nested Loop (actual time=0.065..0.065 rows=1 loops=3)
Buffers: shared hit=10
-> WorkTable Scan on w1 (actual time=0.000..0.000 rows=1 loops=3)
Filter: (val1 IS NOT NULL)
-> Limit (actual time=0.060..0.060 rows=1 loops=3)
Buffers: shared hit=10
-> Index Only Scan using t1_pk on t1 t1_1 (actual time=0.056..0.056 rows=1 lops=3
Index Cond: (val1 > w1.val1)
Heap Fetches: 0
Buffers: shared hit=10
CTE w2
-> Recursive Union (actual time=0.059..3.006 rows=33 loops=1)
Buffers: shared hit=84 read=22
-> CTE Scan on w1 w1_1 (actual time=0.058..0.259 rows=3 loops=1)
Buffers: shared hit=14
-> WorkTable Scan on w2 w2_1 (actual time=0.079..0.248 rows=3 loops=11)
Filter: ((val2 IS NOT NULL) AND (val1 IS NOT NULL))
Rows Removed by Filter: 0
Buffers: shared hit=70 read=22
SubPlan 2
-> Limit (actual time=0.089..0.089 rows=1 loops=30)
Buffers: shared hit=70 read=22
-> Index Only Scan using t1_pk on t1 t1_2 (actul time=0.088..0.088 rows=1 loops=30
Index Cond: ((val1 = w2_1.val1) AND (val2 > w2_1.val2))
Heap Fetches: 0
Buffers: shared hit=70 read=22
Planning Time: 0.679 ms
Execution Time: 3.230 ms
The
execution plan above is too complex to read in one go. Let's take a
look at the CTE w2 part. Since we've got 3 rows from CTE w1, the first
step of the recursive query produces 3 rows. The SubPlan2
represents the scalar subquery. The execution plan shows that we've got
30 Index Only Scans with 70 shared hits and 22 disk I/Os. And that's
exactly the right number of probes to return my result set.
You
will notice that I have got various "is not null" predicates scattered
throughout the query. In some cases this is to stop PostgreSQL from
running into an infinite loop. It is kind of a default pattern for
recursive CTEs and some of them could probably be removed with no change
in meaning.
Wrap Up
We
can take advantage of the recursive CTE technology to pick out the
unique set of combinations of the first two columns of a multi-column
index.
The
downside to using the recursive CTE technology is that if there are too
many distinct values, for example over 10,000, it will have poor
performance.
Addendum
You can use this version of the last query as well.
WITH RECURSIVE w1 AS (
(SELECT val1, val2 FROM t1 ORDER BY val1, val2 LIMIT 1)
UNION ALL
SELECT t1.val1, t1.val2
FROM w1
JOIN LATERAL (SELECT val1, val2
FROM t1
WHERE t1.val1 > w1.val1
ORDER BY t1.val1, t1.val2
LIMIT 1) t1 ON TRUE
WHERE w1.val1 IS NOT NULL
),
w2 AS(
SELECT val1, val2 FROM w1
UNION ALL
SELECT t1.val1, t1.val2
FROM w2
JOIN LATERAL (SELECT val1, val2
FROM t1
WHERE t1.val1 = w2.val1
AND t1.val2 > w2.val2
ORDER BY t1.val1, t1.val2
LIMIT 1) t1 ON TRUE
WHERE w2.val2 is not null
AND w2.val1 is not null
)
SELECT * FROM w2
WHERE w2.val2 is not null
;
Added on September 3, 2023
You can implement this technique in Oracle 12.2 as follows:
WITH W1 (VAL1, VAL2) AS (
(SELECT VAL1, VAL2 FROM T1 ORDER BY VAL1, VAL2 FETCH NEXT 1 ROWS ONLY)
UNION ALL
SELECT (SELECT MIN(T1.VAL1) FROM T1 WHERE T1.VAL1 > W1.VAL1) AS VAL1,
W1.VAL2
FROM W1
WHERE W1.VAL1 IS NOT NULL
)
, W2 (VAL1, VAL2) AS (
SELECT VAL1, VAL2 FROM W1
UNION ALL
SELECT W2.VAL1,
(SELECT MIN(T1.VAL2) FROM T1 WHERE T1.VAL1=W2.VAL1 AND T1.VAL2 > W2.VAL2) AS VAL2
FROM W2
WHERE W2.VAL2 IS NOT NULL
)
SELECT *
FROM W2
WHERE W2.VAL2 IS NOT NULL
AND W2.VAL1 IS NOT NULL
ORDER BY VAL1, VAL2;