설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 162건, 최근 0 건
   

result set ordering

글쓴이 : 모델광 날짜 : 2022-03-05 (토) 07:54 조회 : 1198

You have to include an ORDER BY clause if you want the data to come out in a particular order and it is up to PostgreSQL to decide how to make that ordering happen.

I have always been curious on what the default query result ordering in the absence of an ORDER BY clause is. So I have run up a two-table demonstration on PostgreSQL 14 showing how the order of the result set can be affected by the join order used to join the two tables.

drop table t1;

create table t1

as

select i, mod(i+2,4) as c2, lpad(i::text, 10,'0') c3

  from generate_series(1,4) a(i);


analdb=# select * from t1

 i | c2 |     c3

---+----+------------

 1 |  3 | 0000000001

 2 |  0 | 0000000002

 3 |  1 | 0000000003

 4 |  2 | 0000000004


drop table t2;

create table t2

as 

select * from t1

union ALL

select * from t1

union ALL

select * from t1

union ALL

select * from t1;

analdb=# select * from t2;

 i | c2 |     c3

---+----+------------

 1 |  3 | 0000000001

 2 |  0 | 0000000002

 3 |  1 | 0000000003

 4 |  2 | 0000000004

 1 |  3 | 0000000001

 2 |  0 | 0000000002

 3 |  1 | 0000000003

 4 |  2 | 0000000004

 1 |  3 | 0000000001

 2 |  0 | 0000000002

 3 |  1 | 0000000003

 4 |  2 | 0000000004

 1 |  3 | 0000000001

 2 |  0 | 0000000002

 3 |  1 | 0000000003

 4 |  2 | 0000000004


--If you want to control the join order in PostgreSQL you have to install a "pg_hint_plan" extension.

analdb=# load 'pg_hint_plan';


* Nested Loop Join

I have inserted the /+ leading((t1 t2)) */ hint which  would force the optimizer to use the table t1 as the leading table.


select /+ leading((t1 t2)) NestLoop(t1 t2) */ t1.i, t1.c2, t2.i, t2.c2

  from t1, t2

 where t1.c2 = t2.c2;

 i | c2 | i | c2

---+----+---+----

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2


I have used the /+ leading((t2 t1)) */ hint to specify a join order.


select /+ leading((t2 t1)) NestLoop(t1 t2) */ t1.i, t1.c2, t2.i, t2.c2

  from t1, t2

 where t1.c2 = t2.c2;

 i | c2 | i | c2

---+----+---+----

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2


You can see that the ordering for the Nested Loop is driven by the leading table in the join.


* Hash Join

select /+ leading((t1 t2)) HashJoin(t1 t2) */ t1.i, t1.c2, t2.i, t2.c2

  from t1, t2

 where t1.c2 = t2.c2;

Hash Join  (cost=37.00..314.00 rows=7200 width=16)

  Hash Cond: (t1.c2 = t2.c2)

  ->  Seq Scan on t1  (cost=0.00..22.00 rows=1200 width=8)

  ->  Hash  (cost=22.00..22.00 rows=1200 width=8)

        ->  Seq Scan on t2  (cost=0.00..22.00 rows=1200 width=8)

 i | c2 | i | c2

---+----+---+----

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2


select /+ leading((t2 t1)) HashJoin(t1 t2) */ t1.i, t1.c2, t2.i, t2.c2

  from t1, t2

 where t1.c2 = t2.c2;

Hash Join  (cost=37.00..314.00 rows=7200 width=16)

  Hash Cond: (t2.c2 = t1.c2)

  ->  Seq Scan on t2  (cost=0.00..22.00 rows=1200 width=8)

  ->  Hash  (cost=22.00..22.00 rows=1200 width=8)

        ->  Seq Scan on t1  (cost=0.00..22.00 rows=1200 width=8)

 i | c2 | i | c2

---+----+---+----

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

The ordering for the hash join is driven by the leading table. Here the leading table is the probe input. For your reference, the ordering for the hash join in Oracle is driven by the build input.


* Merge Join

select /+ leading((t1 t2)) MergeJoin(t1 t2) */ t1.i, t1.c2, t2.i, t2.c2

  from t1, t2

 where t1.c2 = t2.c2;

Merge Join  (cost=166.75..280.75 rows=7200 width=16)

  Merge Cond: (t1.c2 = t2.c2)

  ->  Sort  (cost=83.37..86.37 rows=1200 width=8)

        Sort Key: t1.c2

        ->  Seq Scan on t1  (cost=0.00..22.00 rows=1200 width=8)

  ->  Sort  (cost=83.37..86.37 rows=1200 width=8)

        Sort Key: t2.c2

        ->  Seq Scan on t2  (cost=0.00..22.00 rows=1200 width=8)

 i | c2 | i | c2

---+----+---+----

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3


select /+ leading((t2 t1)) MergeJoin(t1 t2) */ t1.i, t1.c2, t2.i, t2.c2

  from t1, t2

 where t1.c2 = t2.c2;

 i | c2 | i | c2

---+----+---+----

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 2 |  0 | 2 |  0

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 3 |  1 | 3 |  1

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 4 |  2 | 4 |  2

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

 1 |  3 | 1 |  3

The order for the merge join is driven by the sort order of the join columns.


Conclusion

If you don't specify an ORDER BY clause, the default query result ordering depends on the join order and the join method. In the Nested Loop join and Hash join the leading table determines the ordering of the result set. In the Merge join the sort order of the join columns determines the ordering of the result set.


Footnote

I have rerun this demonstration on PostgreSQL 13: The effect has not changed.


Warning

I have all heard the adage: to a man with a hammer, everything looks like a nail. Don't get carried away with the knowledge presented here. Don't miss out the ORDER BY clause in your query. The moment any row is updated, the ordering of the resulting data set is changed. You have to use the ORDER BY clause to ensure that your data come out in a specific order.


   

postgresdba.com