설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 170건, 최근 0 건
 

1000 item limit for In-List

글쓴이 : 모델광 날짜 : 2024-12-28 (토) 18:42 조회 : 185
In Oracle database, there is a well-documented limitation where the maximum number of elements in an IN-List is 1000. If you attempt to construct a query with more than 1000 elements in an IN clause, Oracle will throw the following error:

ORA-01795: maximum number of expressions in a list is 1000

This limit exists to prevent resource-intensive queries from overwhelming the SQL parser. To overcome this, Oracle users often split the IN-List into chunks of 1000 elements and combine them with the OR operator, like so:

SELECT * FROM employees
WHERE emp_id IN (1, 2, ..., 1000)
   OR emp_id IN (1001, 1002, ..., 2000)
   OR emp_id IN (2001, ..., 3000);


This works because each IN clause stays within the 1000 item limit.

But does this limitation exist in PostgreSQL? Can PostgreSQL handle larger In-Lists? How does the number of items in the IN-List impact query performance in PostgreSQL? 
In this note, we'll explore these questions and discuss methods to improve performance when dealing with large IN-Lists.

Does PostgreSQL Have a Limit on IN-List Size?
In PostgreSQL, I couldn't find any documented limit on the number of elements in an IN-List. To investigate this, let’s run some experiments.

First, we create a simple table t for testing:
--I have conducted the experiment on PostgreSQL 16.3.

CREATE TABLE t (x INT PRIMARY KEY);

The following PL/pgSQL block dynamically constructs an IN-List with 1000 elements and measures the query's elapsed time

DO $$
DECLARE
    l_sql text;
    l_res int;            -- To store the result of the query
    l_ts timestamp;       -- To store the start time
BEGIN
    -- Initialize the SQL query
    l_sql := 'SELECT COUNT(*) FROM t WHERE x IN (1';

    -- Dynamically build the query using a loop
    FOR i IN 2..1000 LOOP
        l_sql := l_sql || ',' || i;
    END LOOP;

    -- Close the SQL statement
    l_sql := l_sql || ')';

    -- Output the constructed query
    RAISE NOTICE '%', l_sql;

    -- Record the start time
    l_ts := CLOCK_TIMESTAMP();

    -- Execute the constructed SQL query
    EXECUTE l_sql INTO l_res;

    -- Calculate and output the elapsed time
    RAISE NOTICE 'Elapsed=%', (CLOCK_TIMESTAMP() - l_ts);
END $$;

Here is the output generated by running the above code:

analdb=#\timing
--I have run the above PL/pgSQL block
NOTICE:  SELECT COUNT(*) FROM t WHERE x IN (1,2,......997,998,999,1000)
NOTICE:  Elapsed=00:00:00.002162
DO
Time: 5.529 ms

With 1000 elements, PostgreSQL executes the query in just 2.16 ms without any errors. This shows that PostgreSQL handles IN-Lists with up to 1000 items efficiently.

Now let's see whether PostgreSQL can handle over 1000 elements. We modify the loop to generate 10,000 elements:

DO $$
DECLARE
    l_sql text;
    l_res int;            -- To store the result of the query
    l_ts timestamp;       -- To store the start time
BEGIN
    -- Initialize the SQL query
    l_sql := 'SELECT COUNT(*) FROM t WHERE x IN (1';

    -- Dynamically build the query using a loop
    FOR i IN 2..10000 LOOP
        l_sql := l_sql || ',' || i;
    END LOOP;

    -- Close the SQL statement
    l_sql := l_sql || ')';

    -- Output the constructed query
    RAISE NOTICE '%', l_sql;

    -- Record the start time
    l_ts := CLOCK_TIMESTAMP();

    -- Execute the constructed SQL query
    EXECUTE l_sql INTO l_res;

    -- Calculate and output the elapsed time
    RAISE NOTICE 'Elapsed=%', (CLOCK_TIMESTAMP() - l_ts);
END $$;


OutPut:

NOTICE:  SELECT COUNT(*) FROM t WHERE x IN (1,2,......9997,9998,9999,10000)
NOTICE:  Elapsed=00:00:00.020713
DO
Time: 45.987 ms

Note that the query executes successfully but the elapsed time increases to 20.713 ms, indicating that the parsing and execution cost has risen.

As a side note, when I increased the number of in-list items to 20,000 and 100,000, the execution time was 19.104 ms and 110.349 ms respectively. This shows that while PostgreSQL can handle very large IN-Lists, the performance penalty grows slightly as the list size increases.

To avoid the parsing load, I have tested array binding. I thought this could be a powerful alternative to larege IN-Lists. Here is a code to use array binding:

--Use Array Binding
DO $$
DECLARE
    l_sql text;
    l_res int;            -- To store the result of the query
    l_ts timestamp;       -- To store the start time
BEGIN
    -- Initialize the SQL query
    l_sql := 'SELECT COUNT(*) FROM t WHERE x = ANY (ARRAY[1';

    -- Dynamically build the query using a loop
    FOR i IN 2..100000 LOOP
        l_sql := l_sql || ',' || i;
    END LOOP;

    -- Close the SQL statement
    l_sql := l_sql || '])';

    -- Output the constructed query
    RAISE NOTICE '%', l_sql;

    -- Record the start time
    l_ts := CLOCK_TIMESTAMP();

    -- Execute the constructed SQL query
    EXECUTE l_sql INTO l_res;

    -- Calculate and output the elapsed time
    RAISE NOTICE 'Elapsed=%', (CLOCK_TIMESTAMP() - l_ts);
END $$;


Output:

NOTICE:  SELECT COUNT(*) FROM t WHERE x = ANY (ARRAY[1,2,......,99998,99999,100000])
NOTICE:  Elapsed=00:00:00.502401
DO
Time: 3568.440 ms (00:03.568)


To my dismay the array type does not improve performance.

Conclusion
1. Unlike Oracle, PostgreSQL does not impose a strict limit on the number of elements in an IN-List. We can include tens of thousands of elements without encountering errors.
2. As the size of the IN-List grows, parsing and execution costs increase slightly.
    Here is a table showing the number of items and the elapsed time:
    number of IN-List items   execution time
                             1000       2.16 ms
                            10000      20.713 ms
                            20000      19.104 ms
                          100000      110.349 ms
3. Array binding, which allows to pass a list of values as a single parameter, does not improve performance.

Footnote
​In order to mitigate the parsing load, we might use a temporary table. Instead of hardcoding large IN-Lists, load the values into a temporary table and join it with your main table. This approach eliminates parsing overhead and leverages PostgreSQL's query optimization capabilities.

Example:

-- Create a temporary table
CREATE TEMP TABLE temp_ids (id INT);

-- Insert the values
nsert into temp_ids select i from generate_series(1,100000) a(i);

-- Use a JOIN instead of an IN-List

SELECT *
FROM t a
JOIN temp_ids b ON a.x = b.id;
 x | id
---+----
(0 rows)
Time: 3.790 ms


Considerting the SELECT statment, the elapsed time dropped from 110.349 ms to 3.790 ms.

Addendum
Compared to PostgreSQL, the parsing cost in Oracle is significantly higher. When I conducted the experiment with Oracle 19c on my desktop, the elapsed time exceeded 3000 ms for an IN-List containing 100,000 IN-List items.
In Oracle, I strongly recommend using a temporary table to handle such scenarios. However, in PostgreSQL, large IN-Lists can be more convenient and manageable for maintenance purposes.

 

postgresdba.com