blog-post-sql-keys-in-depth

https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html

What we simply called keys in the previous section are traditionally called candidate keys. "Candidate" is terminology which implies that the keys are all competing for the esteemed position of "primary key," with the remainder relegated to "alternate keys."

The naturalness or artificiality of unique properties in a database is relative to the outside world. A key which was artificial at birth in some standards body or government agency becomes natural to us because it's generally agreed upon in the world at large, and/or imprinted on objects.

blog-post-sql-keys-in-depth#artificial-key1Given that a key is a column with unique values in each row, one way to create one is to cheat and throw made-up unique values into each row. Artificial keys are just that: an invented code used for referring to facts or objects blog-post-sql-keys-in-depth#artificial-key1

artificial keys are useful because they make it easy for people or other systems to refer to a row, and also improve lookup and join speed by avoiding string (or multi-column) key comparisons.

an artificial key succinctly identifies a fact or object.

blog-post-sql-keys-in-depth#surrogate-keys1As mentioned above, an important kind of artificial key is called a surrogate key. It's not meant to be succinct or shareable like other artificial keys, it's meant as an internal placeholder that identifies a row forevermore. It's used in SQL and joins but not explicitly referenced by an application. blog-post-sql-keys-in-depth#surrogate-keys1

Don't "naturalize" surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow users to work with the value (perhaps via search), you have effectively given the key business meaning. The exposed key in our database could then be considered a natural key in someone else's.

blog-post-sql-keys-in-depth#uuids-are-not-strings1Some people think UUIDs are strings because of the traditional dashed hexadecimal representation: 5bd68e64-ff52-4f54-ace4-3cd9161c8b7f. In fact some databases don't have a compact (128-bit) uuid type, but PostgreSQL does. It's the size of two bigints, and that's not an appreciable overhead when compared with the bulk of other information in the database. blog-post-sql-keys-in-depth#uuids-are-not-strings1

blog-post-sql-keys-in-depth#highly-randomized-values-lead-to-write-amplification1 2The real problem with UUIDs is that highly randomized values cause write amplification due to full-page writes in the write-ahead log (WAL). This means worse performance when inserting rows. blog-post-sql-keys-in-depth#highly-randomized-values-lead-to-write-amplification1 2

blog-post-sql-keys-in-depth#full-page-writeIndexing highly random values like UUIDs tends to touch a whole lot of different disk pages, which means writing the full page size (usually 4k or 8k) to the WAL for each insertion. That's called a full-page write (FPW). blog-post-sql-keys-in-depth#full-page-write

blog-post-sql-keys-in-depth#key-stability1This approach provides internal key stability while acknowledging and protecting natural keys blog-post-sql-keys-in-depth#key-stability1

Referring Pages

data-architecture-glossary