When you work in a project as a
data architect or a data modeler, the first thing you do is to create a
data dictionary which includes a word dictionary, a term dictionary, a
domain dictionary and a code dictionary. In theory you can produce a
data dictionary without cosidering what DBMS you use. In practice you
need to have thorough knowlege of the DBMS you use.
We
have been Oracle people for a long time. We see the database through
the lens of Oracle, which makes us ignore features available in
PostgreSQL. We do not think of how Postgres does something. PostgreSQL
is data type rich. If you know the technology you can gain performance
benefits.
In this note I would like to discuss how we will define a number domain in a domain dictionary.
Here is an example of a typical domain dictionary.
도메인대분류
|
도메인명
|
인포타입
|
Type |
Length |
Description |
금액 |
가격 |
가격_NU_18_3 |
numeric |
18,3 |
|
금액 |
요금 |
요금_NU_11_1 |
numeric |
11,1 |
|
식별자 |
ID |
ID_VC2_20 |
varchar2 |
20 |
|
식별자 |
번호 |
번호_NU_10 |
numeric |
10 |
|
식별자
|
순번
|
순번_NU_2
|
numeric
|
2
|
|
In
Oracle the main datatype used to store floating numbers and integers is
NUMBER. It is a variable-lenghth datatype. In Oracle INTEGER is
depricated and a keyword retained for compatibility.
In
PostgreSQL when you are sure that the domain only contains an integer,
you'd better use smallint, integer, and bigint depending on the range of
the numeric value. If you use the right data type, you will have much
more optimal utilization of storage. If you manage to fit more rows in less space, you will retrieve more data on each disk access.
Here is a quick and dirty script to check how much space it takes.
select pg_column_size(9::numeric) as n, pg_column_size(9::smallint) as s
, pg_column_size(9::int) as i, pg_column_size(9::bigint) as b;
+---+---+---+---+
| n | s | i | b |
+---+---+---+---+
| 8 | 2 | 4 | 8 |
+---+---+---+---+
Unlike
Oracle in PostgreSQL the size of a single dight number is 8 bytes if
the data type is numeric. If you want to reduce the size of a single
digit number you have to create the column as a SMALLINT data type.
The drawback of the INT data type is that you cannot restrict the number of digits. If you have to, you must have a check constraint like this:
check (c1 >= 0 and c1 <= 9)
Here is a summary of the Pros and Cons of using the INT data type:
Pros
use less space which reduces the chances of causing LWLocks
is more performant in an arithmetic operation and a join operation.
Cons
can not restrict the number of digits and if you use a check constraint it degrades performance.
it
may complicate the management of the domain dictionary since you have
to add the INT data type and you are required to have more thought on
allocating data types.
Conclusion
When you convert Oracle to PostgreSQL make use of the approprate data type.
Addendum 1
Disk space is cheap but not free. Database disk space used has some serious multipliers:
Backups
Replicas
Development System / Staging System / Production System
WAL size
Laziness costs money! On top of that consuming more shared buffer cache is detrimental to performance.
Addendum 2
Here is a little test script to demonstrate that arithmetic on numeric values is slow compared to the integer types (especially for large tables).
drop table t1;
create table t1 (c1 numeric(32), c2 int);
insert into t1
select round(random()*100), round(random()*100)
from generate_series(1,10000000);
set max_parallel_workers_per_gather to 0;
select pg_size_pretty(pg_relation_size('t1'));
pg_size_pretty
--------------
422 MB
analdb#\timing
select avg(c1)
from t1;
723 ms
select avg(c2)
from t1;
515ms
Be aware that an arithmetical operation on integer values is 30 percent faster.
Addendum 3
I am frequently confronted with a project where they use the varchar type as an artificial identifier. I highly encourage you to make use of the bigint data type if you are sure the column only contains integer values.
Let's say that you have to insert a number '123456789012345' in the
column marked as a primary key.
analdb=# SELECT pg_column_size('123456789012345') as "VARCHAR"
, pg_column_size('123456789012345'::bigint) as "BIGINT"
, pg_column_size('123456789012345'::numeric) as "NUMERIC"
| VARCHAR | BIGINT | NUMERIC |
-----------------------------------------
| 16 | 8 | 14 |
To
store the same exact number here, the bigint data type only takes 8
bytes and the numeric data type takes 14 bytes. If you have a large
table you can not ignore the space difference. Storing your primary key in NUMERIC in Postgres is like storing your email addresses in CLOB in Oracle. It is complete overkill. Never use NUMERIC for a primary key.
Use a BIGINT data type. You are going to see 20 percent or something
performance differences betwen using NUMERIC and BIGINT over your
application. The first thing you do when you start a project is get the data types right.