What to choose? Integer or GUID as the data type for your primary key?
Have you ever been thinking about what data type you should choose when making new entities for your database? 🤔 I have seen this question lots of times and I have had it multiple times myself when designing a new or existing database. The short answer?
Choosing INT or GUID as PK data type, all depends on what database engine you are using and the type of solution you are developing.
There are pros and cons for both
GUID, but it all comes down to your context. In this short post, I will discuss the pros and cons of using both data types when designing a database. 🗄️
Pros and Cons of INT as a data type
Let's begin with the
INT (Integer) data type. An integer is a data type that stores whole numbers in the range from -2.147.483.647 to 2.147.483.647 for 9 or 10 digits of precision.
The number maximum integer 2.147.483.648 is a reserved value and cannot be used. The
INT value is stored as a signed binary integer and is typically used to store counts, quantities, and so on in your database for your application. The values are often generated using stored procedures or SQL queries using Dapper or EF Core.
- If used with database engines like
PostgreSQL, etc... you can enable automatically
Identity Incrementwhen inserting new records in the database.
- An integer is small, it only takes up 4 bytes of data storage in the database per value/record.
- It is faster to join tables when using an integer.
- It is not easy to merge databases due to the fact that the
IDsare often already taken.
- It is a more tough task to distribute databases across multiple servers if you deploy your database services behind a load balancer.
- If auto
Identity Incrementis enabled you won't be able to generate
IDsin all places you probably need them.
- They are easy to overflow.
Pros and Cons of GUID as a data type
Finally, we got the
GUID (Global Unique Identification Number) data type. A GUID is larger than an integer and takes up 16 bytes, in other words, it is a 128-bit integer. A
GUID can be used across all computers and networks wherever we need a unique identifier.
GUID is an exceptional choice when you need a solution with a unique identifier that got a low probability of being repeated in the application. Just a quick notice,
GUID is later named
- They are unique and hard to duplicate.
- They do not overflow as an integer can do.
- It is easier to merge databases when records are from multiple databases as the chance that two records with the same ID are almost unlikely. (there will always be a small chance)
- You can distribute data across multiple databases across your network. Makes a perfect choice for microservices.
- You can generate the
IDof your record in the application and database because you don't need to know the latest existing
IDin the database.
- Great in combination with modern architecture and patterns like CQRS and microservices.
- It's a larger data type than an integer (4 bytes). 12 bytes per value in your database for each GUID stored.
- It can lead to performance issues because the store value is larger. However, I have never noticed this as a problem as most new hardware is insanely fast.
- They are not auto-generated as the integer is in the database.
What should you choose? INT or GUID? 🤔
Now you know some pros and cons of the two data types. You probably already have a few ideas in your head when reading at this point about what data type you would prefer in your application.
Below is a list of when best use cases for each data type.
When to use an INT (integer) as the primary key for your records?
The integer is an excellent choice if:
- You plan only a single database without synchronization of data from 3rd-party data providers.
- Your application is based on monolithic architecture.
- When you want the IDs of your records to be automatically generated in the database and not the application. (When used with EF Core, you will get the
IDin return when adding data to your database).
When to use a GUID as the primary key for your records?
The GUID makes a perfect fit for you if:
- You are importing and pushing data to and from other sources (3rd-party providers).
- When you develop distributed solutions.
- When you need the option for generating the
IDapart from the database. In the application or from the imported data.
- When you make solutions handling big data where it's not possible to use anything other than
When I am helping out clients with (often legacy 😅) solutions they are often created using integers as the data type. It was the choice back in the day for almost any application as they were and often are stand-alone (monolithic). I also use
INT as the data type when making many of my tutorials 💡 because they are easier for learners to remember 🧠 compared to a
With all the new database engines and the underlying hardware (cloud or on-prem), it is not really a problem to use
GUIDs as the data type. I actually prefer using
GUIDs in new applications due to the fact that I can generate them rapidly and offline + it makes it easier to parse data onto other systems and I don't need a central authority for generating them.
If you are working on an old solution it is often not possible to use
GUIDs because lots of tables have been joined together and it won't be possible to join those with new tables or the other way around.
I hope you learned a thing or two from this post. If you got any questions, please let me know in the comments below. Until next time - Happy coding! ✌️