설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 118건, 최근 0 건
   

Numeric vs. INT

글쓴이 : 모델광 날짜 : 2022-08-20 (토) 08:22 조회 : 91

When you work in a project as a data architect or a data modeler, the first thing you do is to create a data dictionary which includes a word dictionary, a term dictionary, a domain dictionary and a code dictionary. In theory you can produce a data dictionary without cosidering what DBMS you use. In practice you need to have thorough knowlege of the DBMS you use.

We have been Oracle people for a long time. We see the database through the lens of Oracle, which makes us ignore features available in PostgreSQL. We do not think of how Postgres does something. PostgreSQL is data type rich. If you know the technology you can gain performance benefits.

In this note I would like to discuss how we will define a number domain in a domain dictionary.


Here is an example of a typical domain dictionary.


 도메인대분류

 도메인명

인포타입

 Type

 Length

 Description

 금액

 가격

 가격_NU_18_3

 numeric

 18,3

 

 금액

 요금

 요금_NU_11_1

 numeric

 11,1

 

 식별자

 ID

 ID_VC2_20

 varchar2

 20

 

 식별자

 번호

 번호_NU_10

 numeric

 10

 

 식별자

 순번

 순번_NU_2

 numeric

 2



In Oracle the main datatype used to store floating numbers and integers is NUMBER. It is a variable-lenghth datatype. In Oracle INTEGER is depricated and a keyword retained for compatibility.


In PostgreSQL when you are sure that the domain only contains an integer, you'd better use smallint, integer, and bigint depending on the range of the numeric value. If you use the right data type, you will have much more optimal utilization of storage. If you manage to fit more rows in less space, you will retrieve more data on each disk access.


Here is a quick and dirty script to check how much space it takes.


select pg_column_size(9::numeric) as n, pg_column_size(9::smallint) as s
        , pg_column_size(9::int) as i, pg_column_size(9::bigint) as b;

+---+---+---+---+
| n | s | i | b |
+---+---+---+---+
| 8 | 2 | 4 | 8 |
+---+---+---+---+


Unlike Oracle in PostgreSQL the size of a single dight number is 8 bytes if the data type is numeric. If you want to reduce the size of a single digit number you have to create the column as a SMALLINT data type.


The drawback of the INT data type is that you cannot restrict the number of digits. If you have to, you must have a check constraint like this:


check (c1 >= 0 and c1 <= 9)


Here is a summary of the Pros and Cons of using the INT data type:


Pros

use less space which reduces the chances of causing LWLocks

is more performant in an arithmetic operation and a join operation.


Cons

can not restrict the number of digits and if you use a check constraint it degrades performance.

it may complicate the management of the domain dictionary since you have to add the INT data type and you are required to have more thought on allocating data types.


Conclusion

When you convert Oracle to PostgreSQL make use of the approprate data type.


Addendum 1

Disk space is cheap but not free. Database disk space used has some serious multipliers:


Backups

Replicas

Development System / Staging System / Production System

WAL size


Laziness costs money! On top of that consuming more shared buffer cache is detrimental to performance.


Addendum 2

Here is a little test script to demonstrate that arithmetic on numeric values is slow compared to the integer types (especially for large tables).


drop table t1;
create table t1 (c1 numeric(32), c2 int);
insert into t1
select round(random()*100), round(random()*100)
 from generate_series(1,10000000);
set max_parallel_workers_per_gather to 0;
select pg_size_pretty(pg_relation_size('t1'));

pg_size_pretty

--------------

422 MB

analdb#\timing


select avg(c1)
  from t1;

723 ms


select avg(c2)
  from t1;

515ms


Be aware that an arithmetical operation on integer values is 30 percent faster. 


Addendum 3

I am frequently confronted with a project where they use the varchar type as an artificial identifier. I highly encourage you to make use of the bigint data type if you are sure the column only contains integer values. Let's say that you have to insert a number '123456789012345' in the column marked as a primary key.


analdb=# SELECT pg_column_size('123456789012345') as "VARCHAR"
                           , pg_column_size('123456789012345'::bigint) as "BIGINT"
                           , pg_column_size('123456789012345'::numeric) as "NUMERIC"
| VARCHAR | BIGINT | NUMERIC |

-----------------------------------------

| 16            | 8          | 14      |


To store the same exact number here, the bigint data type only takes 8 bytes and the numeric data type takes 14 bytes. If you have a large table you can not ignore the space difference. Storing your primary key in NUMERIC in Postgres is like storing your email addresses in CLOB in Oracle. It is complete overkill. Never use NUMERIC  for a primary key. Use a BIGINT data type. You are going to see 20 percent or something performance differences betwen using NUMERIC and BIGINT over your application. The first thing you do when you start a project is get the data types right.



   

postgresdba.com