What can be the downside of always having a single integer column as primary key?

Question

Within one Web application I am working on, all database operations are abstracted using some generic repositories defined over Entity Framework ORM.

However, in order to have a simple design for the generic repositories, all involved tables must define an unique integer (Int32 in C#, int in SQL). Until now, this has been always the PK of the table and also the IDENTITY.

Foreign keys are heavily used and they reference these integer columns. They are required for both consistency and for generating navigational properties by the ORM.

The application layer typically does the following operations:

initial data load from table (*) - SELECT * FROM table
Update - UPDATE table SET Col1 = Val1 WHERE Id = IdVal
Delete - DELETE FROM table WHERE Id = IdVal
Insert - INSERT INTO table (cols) VALUES (...)

Less frequent operations:

Bulk insert - BULK INSERT ... into table followed (*) by all data load (to retrieve generated identifiers)
Bulk delete - this is a normal delete operation, but "bulky" from ORM's perspective: DELETE FROM table where OtherThanIdCol = SomeValue
Bulk update - this is a normal update operation, but "bulky" from ORM's perspective: UPDATE table SET SomeCol = SomeVal WHERE OtherThanIdCol = OtherValue

*all small tables are cached at application level and almost all SELECTs will not reach database. A typical pattern is initial load and lots of INSERTs, UPDATEs and DELETEs.

Based on current application usage, there is very small chance of ever reaching 100M records in any of the tables.

Question: From a DBA's perspective, are there significant problems I can run into by having this table design limitation?

@Kin - thanks for the interesting article. However, shifting to Guids is not option now (it involves a complete refactoring of tables). Also, dealing with integers is much easier (human readable) than dealing with Guids. — Alexei, 17 hours ago
You might find these posts of relevance, as some logical, physical and practical aspects are discussed with regards to the use of columns with system-generated surrogate values. — MDCCL, 17 hours ago

Aaron Bertrand · Answer 1 · 2017-02-03 16:09:07Z

Other than additional disk space (and in turn memory usage and I/O), there's not really any harm in adding an IDENTITY column even to tables that don't need one (an example of a table that doesn't need an IDENTITY column is a simple junction table, like mapping a user to his/her permissions).

I rail against blindly adding them to every single table in a blog post from 2010:

Bad habits to kick : putting an IDENTITY column on every table

But surrogate keys do have valid use cases - just be careful not to assume that they guarantee uniqueness (which is sometimes why they get added - they should not be the only way to uniquely identify a row). If you need to use an ORM framework, and your ORM framework requires single-column integer keys even in cases when your real key is either not an integer, or not a single column, or neither, make sure that you define unique constraints/indexes for your real keys, too.

Thanks for the quick reply. Yes, the application uses an ORM (EF). It does not require single integer column keys, but I have introduced this restriction to make some generic operations much easier (design-wise). Also, all application caches stores everything in maps (dictionaries) for fast retrievals by key and the key must be unique. Since, I have chosen ints over guids, I am forced to use IDENTITY for any table that I insert into. For fixed values tables, IDENTITY is not required. — Alexei, 18 hours ago

CM_Dayton · Answer 2 · 2017-02-03 20:15:59Z

In my experience with various databases, an Integer primary key is always better than the applications that have no keys defined at all. Or that have keys that join half a dozen varchar columns in awkward ways that aren't logical... (sigh)

I've seen applications that switched from integer PKs to GUIDs. Their reason for doing so was because there was a need to merge data from multiple source databases in certain cases. The developers switched all of the keys to GUIDs so that the merges could happen without fear of data collisions, even on tables that weren't part of the merge (just in case those tables ever became part of a future merge).

I'd say an integer PK is not going to bite you unless you plan to merge data from separate sources or you might have data that goes beyond your integer size limits -- it is all fun and games until you run out of space for inserts.

I will say, though, that it can make sense to set your clustered index on a column other than your PK, if the table will be queried more frequently that way. But that's an outliar case, especially if the bulk of updates and selects are based on the PK values.

T.H. · Answer 3 · 2017-02-03 16:20:45Z

up vote 1 down vote

Putting aside:

The religious wars (google surrogate vs natural key)
The separate issue of what clustered indexes to define on your tables
The viability of caching all your data

Provided you're using bulk delete/update where appropriate, and have indexes to support such operations, I don't think you'll run into trouble due to the PK standard you use.
It's possible that if you later have EF generate queries with joins etc, that they won't be as efficient as they would be with a natural key based repository, but I don't know enough about that area to say for sure either way.

answered 18 hours ago

T.H.

744

2

I can't think of a single case where a join on a natural key would be more efficient than a join on an integer - not many natural keys can be smaller than 4 bytes, and if they are, there can't be enough unique rows to make the difference material. – Aaron Bertrand♦ 17 hours ago

For competent, optimisable SQL I agree, but I was referring to possible limitations of SQL generators. My only experience in this area is being asked to create extensive views with which EF could be spoon-fed--though it's possible the .net devs didn't know enough about EF, or that there were other reasons. – T.H. 17 hours ago

@AaronBertrand I would say that the only way in which they might be more efficient is if a join was not needed at all. The only places I consider the use of natural keys is with standard code lists such as ISO4127 currency codes (which are human-recognisable), and I might use GBP, EUR etc as the foreign key to an primary or alternative key on the currency code table. – David Aldridge 16 hours ago

@David Of course, I was talking about cases where joins are necessary. There are a lot of cases where I don't want the natural key proliferated throughout all the related tables, because natural keys can change, and that is a painful thing. – Aaron Bertrand♦ 16 hours ago

Hmmm, I see how my answer could be misunderstood to be promoting natural foreign keys over surrogate. To be clear, I actually only mentioned them because a) I read Alexei's question as "is it a problem that we don't use natural keys?", b) Alexei's wrap-up question started with "from a DBA's perspective" and I felt I should sort of acknowledge there's more than one perspective and c) because I would think the ORM features to be used largely dictates the choice (if it actually can make a difference). I'm firmly in the surrogate foreign key camp myself. – T.H. 14 hours ago

add a comment |

AnoE · Answer 4 · 2017-02-03 21:45:54Z

From my experience, the main and overwhelming reason to use a separate ID for every table is the following:

In almost every case my customer swore a blood oath in the conception phase that some external, "natural" field XYZBLARGH_ID will stay unique forever, and will never change for a given entity, and will never be re-used, there eventually appeared cases where the Primary Key properties were broken. It just does not work out that way.

Then, from a DBA point of view, the things that makes a DB slow or bloated are certainly not 4 bytes (or whatever) per row, but things like wrong or missing indexes, forgotten table/index reorganizations, wrong RAM/tablespace tuning parameters, neglecting to use bind variables and so on. Those can slow down the DB by factors of 10, 100, 10000... not an additional ID column.

So, even if there were a technical, measurable downside of having an additional 32 bit per row, it is not a question of whether you can optimize the ID away, but whether the ID will be essential at some point, which it will be more likely than not. And I'm not going to count off all the "soft" benefits from a software development stance (like your ORM example, or the fact that it makes it easier for software developers when all IDs by design have the same datatype and so on).

N.B.: note that you do not need a separate ID for n:m association tables because for such tables the IDs of the associated entities should form a primary key. A counterexample would be a weird n:m association which allows multiple associations between the same two entities for whatever bizarre reason - those would need their own ID column then, to create a PK. There are ORM libraries which cannot handle multi-column PKs though, so that would be a reason to be lenient with the developers, if they have to work with such a library.

asked	today
viewed	916 times
active	today

current community

your communities

more stack exchange communities

What can be the downside of always having a single integer column as primary key?

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged database-design sql-server-2014 identity or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

What can be the downside of always having a single integer column as primary key?

4 Answers 4

Did you find this question interesting? Try our newsletter

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged database-design sql-server-2014 identity or ask your own question.

Linked

Related

Hot Network Questions