Here we can see Incremental Sort which is a new feature in version 13. Thanks to the new feature, the optimizer doesn't have to sort the entire (sale_date, quantity, channel, subsidiary_id) block. It only needs to sort (quantity, channel, subsidiary_id) block. So it can avoid more demanding external sort. Hmm... It seems that we need to change the order of the column on the index SALES_X03 to reduce the sorting load. Let's change the order of the columns on the index.
drop index SALES_X03;
CREATE INDEX SALES_X03 ON SALES (SALE_DATE, QUANTITY, CHANNEL, SUBSIDIARY_ID);
Limit (actual time=0.060..1.465 rows=100 loops=1)
Buffers: shared hit=4 read=19
-> Group (actual time=0.059..1.451 rows=100 loops=1)
Group Key: sale_date, quantity, channel, subsidiary_id
Buffers: shared hit=4 read=19
-> Index Only Scan using sales_x03 on sales (actual time=0.056..1.210 rows=933 loops=1)
Index Cond: (quantity <= 5)
Filter: (subsidiary_id = ANY ('{10,40}'::numeric[]))
Rows Removed by Filter: 2207
Heap Fetches: 91
Buffers: shared hit=4 read=19
Planning:
Buffers: shared hit=27 read=1
Planning Time: 0.432 ms
Execution Time: 1.503 ms
Here we can't see any sorting operation. The total block I/O decreased from (4+25) to (4+19). The elapsed time decreased from 1.854 to 1.503 ms. You may think that this is it and now we can ship this index onto the production system. However, we have to deal with the possibility of the index bloat.
The disadvantage to this index ( SALE_DATE, QUANTITY, CHANNEL, SUBSIDIARY_ID) is:
- We have indexed every row in the 2.2M rows table with a 4-column index. And those columns are likely to be subject to several changes. Even when the non-indexed columns are updated, the index we created has to be inserted rows as well. This is the PostgreSQL architecture feature. I will not go into details of the architecture in this note.
(Thinking Phase 3)
So we have to have indexes on fewer columns if possible. Furthermore, we have to find out how to reduce the size of the index.
So now let's make an index on the column SALE_DATE only.
CREATE INDEX SALES_X04 ON SALES(SALE_DATE);
Limit (actual time=0.597..1.811 rows=100 loops=1)
Buffers: shared hit=39 read=6
-> Group (actual time=0.595..1.804 rows=100 loops=1)
Group Key: sale_date, quantity, channel, subsidiary_id
Buffers: shared hit=39 read=6
-> Incremental Sort (actual time=0.594..1.683 rows=933 loops=1)
Sort Key: sale_date, quantity, channel, subsidiary_id
Presorted Key: sale_date
Full-sort Groups: 5 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB
Pre-sorted Groups: 5 Sort Method: quicksort Average Memory: 39kB Peak Memory: 39kB
Buffers: shared hit=39 read=6
-> Index Scan using sales_x04 on sales (actual time=0.161..1.062 rows=942 loops=1)
Filter: ((subsidiary_id = ANY ('{10,40}'::numeric[])) AND (quantity <= 5))
Rows Removed by Filter: 2573
Buffers: shared hit=39 read=6
Planning:
Buffers: shared hit=18 read=1
Planning Time: 1.560 ms
Execution Time: 2.049 ms
People from Oracle background will not be able to understand this execution plan. Even though there is no SALE_DATE column in the where clause, the optimizer accessed SALE_DATE index, which is not possible in ORACLE. And PostgreSQL 13 new feature, Incremental Sort, kicked in. In this particular pattern of data, the index on SALE_DATE worked well. But if there are not many rows filtered by the where clause, this index can be a disaster because we have to access broader range of the index. Let's say that the rows filtered by (subsidiary_id, quantity) are small and we decided not to use the index on (subsidiary_id, quantity) for some reason. Then how can we improve the performance of the query using the index on SALE_DATE?
(Thinking Phase 4)
Let's suppose that:
- in the where SUBSIDIARY_ID IN (10,40) AND QUANTITY <= 5 clause, quantity value '5' is a fixed value.
- (subsidiary_id, quantity) can filter many rows but we cannot make an index on the column for some reason. For example, (subsidiary_id, quantity) index can decrease the performance of another query.
Now we may use a filtered index.
CREATE INDEX SALES_X05 ON SALES(SALE_DATE)
WHERE QUANTITY <= 5 AND SUBSIDIARY_ID IN (10,40);