In Oracle database, there is a well-documented limitation where the maximum number of elements in an IN-List is 1000. If you attempt to construct a query with more than 1000 elements in an IN clause, Oracle will throw the following error:
ORA-01795: maximum number of expressions in a list is 1000
This limit exists to prevent resource-intensive queries from overwhelming the SQL parser. To overcome this, Oracle users often split the IN-List into chunks of 1000 elements and combine them with the OR operator, like so:
SELECT * FROM employees
WHERE emp_id IN (1, 2, ..., 1000)
OR emp_id IN (1001, 1002, ..., 2000)
OR emp_id IN (2001, ..., 3000);
This works because each IN clause stays within the 1000 item limit.
But does this limitation exist in PostgreSQL? Can PostgreSQL handle larger In-Lists? How does the number of items in the IN-List impact query performance in PostgreSQL?
In this note, we'll explore these questions and discuss methods to improve performance when dealing with large IN-Lists.
Does PostgreSQL Have a Limit on IN-List Size?
In PostgreSQL, I couldn't find any documented limit on the number of elements in an IN-List. To investigate this, let’s run some experiments.
First, we create a simple table t for testing:
--I have conducted the experiment on PostgreSQL 16.3.
CREATE TABLE t (x INT PRIMARY KEY);
The following PL/pgSQL block dynamically constructs an IN-List with 1000 elements and measures the query's elapsed time
DO $$
DECLARE
l_sql text;
l_res int; -- To store the result of the query
l_ts timestamp; -- To store the start time
BEGIN
-- Initialize the SQL query
l_sql := 'SELECT COUNT(*) FROM t WHERE x IN (1';
-- Dynamically build the query using a loop
FOR i IN 2..1000 LOOP
l_sql := l_sql || ',' || i;
END LOOP;
-- Close the SQL statement
l_sql := l_sql || ')';
-- Output the constructed query
RAISE NOTICE '%', l_sql;
-- Record the start time
l_ts := CLOCK_TIMESTAMP();
-- Execute the constructed SQL query
EXECUTE l_sql INTO l_res;
-- Calculate and output the elapsed time
RAISE NOTICE 'Elapsed=%', (CLOCK_TIMESTAMP() - l_ts);
END $$;
Here is the output generated by running the above code:
analdb=#\timing
--I have run the above PL/pgSQL block
NOTICE: SELECT COUNT(*) FROM t WHERE x IN (1,2,......997,998,999,1000)
NOTICE: Elapsed=00:00:00.002162
DO
Time: 5.529 ms
With 1000 elements, PostgreSQL executes the query in just 2.16 ms without any errors. This shows that PostgreSQL handles IN-Lists with up to 1000 items efficiently.
Now let's see whether PostgreSQL can handle over 1000 elements. We modify the loop to generate 10,000 elements:
DO $$
DECLARE
l_sql text;
l_res int; -- To store the result of the query
l_ts timestamp; -- To store the start time
BEGIN
-- Initialize the SQL query
l_sql := 'SELECT COUNT(*) FROM t WHERE x IN (1';
-- Dynamically build the query using a loop
FOR i IN 2..10000 LOOP
l_sql := l_sql || ',' || i;
END LOOP;
-- Close the SQL statement
l_sql := l_sql || ')';
-- Output the constructed query
RAISE NOTICE '%', l_sql;
-- Record the start time
l_ts := CLOCK_TIMESTAMP();
-- Execute the constructed SQL query
EXECUTE l_sql INTO l_res;
-- Calculate and output the elapsed time
RAISE NOTICE 'Elapsed=%', (CLOCK_TIMESTAMP() - l_ts);
END $$;
OutPut:
NOTICE: SELECT COUNT(*) FROM t WHERE x IN (1,2,......9997,9998,9999,10000)
NOTICE: Elapsed=00:00:00.020713
DO
Time: 45.987 ms
Note that the query executes successfully but the elapsed time increases to 20.713 ms, indicating that the parsing and execution cost has risen.
As a side note, when I increased the number of in-list items to 20,000 and 100,000, the execution time was 19.104 ms and 110.349 ms respectively. This shows that while PostgreSQL can handle very large IN-Lists, the performance penalty grows slightly as the list size increases.
To avoid the parsing load, I have tested array binding. I thought this could be a powerful alternative to larege IN-Lists. Here is a code to use array binding:
--Use Array Binding
DO $$
DECLARE
l_sql text;
l_res int; -- To store the result of the query
l_ts timestamp; -- To store the start time
BEGIN
-- Initialize the SQL query
l_sql := 'SELECT COUNT(*) FROM t WHERE x = ANY (ARRAY[1';
-- Dynamically build the query using a loop
FOR i IN 2..100000 LOOP
l_sql := l_sql || ',' || i;
END LOOP;
-- Close the SQL statement
l_sql := l_sql || '])';
-- Output the constructed query
RAISE NOTICE '%', l_sql;
-- Record the start time
l_ts := CLOCK_TIMESTAMP();
-- Execute the constructed SQL query
EXECUTE l_sql INTO l_res;
-- Calculate and output the elapsed time
RAISE NOTICE 'Elapsed=%', (CLOCK_TIMESTAMP() - l_ts);
END $$;
Output:
NOTICE: SELECT COUNT(*) FROM t WHERE x = ANY (ARRAY[1,2,......,99998,99999,100000])
NOTICE: Elapsed=00:00:00.502401
DO
Time: 3568.440 ms (00:03.568)
To my dismay the array type does not improve performance.
Conclusion
1. Unlike Oracle, PostgreSQL does not impose a strict limit on the number of elements in an IN-List. We can include tens of thousands of elements without encountering errors.
2. As the size of the IN-List grows, parsing and execution costs increase slightly.
Here is a table showing the number of items and the elapsed time:
number of IN-List items execution time
1000 2.16 ms
10000 20.713 ms
20000 19.104 ms
100000 110.349 ms
3. Array binding, which allows to pass a list of values as a single parameter, does not improve performance.
Footnote
In order to mitigate the parsing load, we might use a temporary table. Instead of hardcoding large IN-Lists, load the values into a temporary table and join it with your main table. This approach eliminates parsing overhead and leverages PostgreSQL's query optimization capabilities.
Example:
-- Create a temporary table
CREATE TEMP TABLE temp_ids (id INT);
-- Insert the values
nsert into temp_ids select i from generate_series(1,100000) a(i);
-- Use a JOIN instead of an IN-List
SELECT *
FROM t a
JOIN temp_ids b ON a.x = b.id;
x | id
---+----
(0 rows)
Time: 3.790 ms
Considerting the SELECT statment, the elapsed time dropped from 110.349 ms to 3.790 ms.
Addendum
Compared to PostgreSQL, the parsing cost in Oracle is significantly higher. When I conducted the experiment with Oracle 19c on my desktop, the elapsed time exceeded 3000 ms for an IN-List containing 100,000 IN-List items.
In Oracle, I strongly recommend using a temporary table to handle such scenarios. However, in PostgreSQL, large IN-Lists can be more convenient and manageable for maintenance purposes.