02 Feb 1996 updated
The word "key" is much used and abused in the context of relational database design. In pre-relational databases (hierarchtical, networked) and file systems (ISAM, VSAM, et al) "key" often referred to the specific structure and components of a linked list, chain of pointers, or other physical locator outside of the data. It is thus natural, but unfortunate, that today people often associate "key" with a RDBMS "index". We will explain what a key is and how it differs from an index.
According to Codd, Date, and all other experts, a key has only one meaning in relational theory: it is a set of one or more columns whose combined values are unique among all occurrences in a given table. A key is the relational means of specifying uniqueness.
There are only three types of relational keys (foreign keys are another issue and discussed separately):
There can be any number of candidate keys in a table (as demonstrated elsewhere). Relational pundits are not in agreement whether zero candidate keys is acceptable, since that would contradict the (debatable) requirement that there must be a primary key.
As we will see below, there is no property of a primary key which is not shared by all other candidate keys in of the table except this arbitrary designation. Unfortunately E-R methodologies and RDBMS products have come to rely on the primary key almost to the exclusion of the concept of candidate keys, which are rarely supported by software.
Therefore it is completely incorrect and misleading that several data modeling products (e.g., ERwin, Silverrun, PowerDesigner) provide "alternate keys" which are selected arbitrarily by the designer, without qualifying the alternates as unused candidate keys, in fact without any mechanism to even designate candidate keys.
There is an insidious trap in the fallacy of not supporting candidate keys and providing arbitrary primary and alternate keys instead. Designers become accustomed to the fact the a primary key is unique and often assign a primary key just to obtain uniqueness. If this requirement of uniqueness per se is not recorded, then there is a danger that at some point someone else may eventually change the primary key (for example, to a system assigned number) and lose the original enforcement of uniqueness on what had been the primary key.
Let's look a this simple association or join table below which holds student class enrollment:
This is
the default form, and often the only form taught or recommended for a join
table. The primary key of each contribution table is inherited so that
this table has a compound primary key. Since the DDL will create a unique
index on the two columns, the designer knows that each student-class pair
will be unique; i.e., each student may enroll in each class only once.
Several
years later a new DBA decides that it is inefficient to use two columns
for the primary key where one would do. She adds a "row id" column
and makes it the primary key by loading it with a system counter. This
is fine as far as an identity for each row. But now nothing prevents a
student from enrolling in the same class multiple times!
This happened
because the data model did not retain a candidate key property on the two
original columns when the primary key was changed. Therefore the new DBA
had no direct way of knowing (other than text notes somewhere) that these
two columns must still remain unique, even though they are no longer part
of the primary key.
Notice here
how this could have been handled automatically by the model, if it had
captured candidate keys in the first place and then generated alternate
keys as a function of those candidates not in the primary key. The original
two columns remain unique even after they are no longer primary
So the lesson is: do not confuse a candidate key specification of uniqueness with the arbitrary selection of primary key. As always, {Date95} sums it up nicely, on page 128: "Candidate keys are important because they provide the tuple-level addressing mechanism in the relational model."
We have used the word index only in the context of implementing uniqueness. A key is about uniqueness, not access, while an index is about access, and not necessarily uniqueness. In concept, an index is merely any set of one or more columns from a given table, sorted an any arbitrary sequence. Thus an index is a sorted sub-set of the data in the table to which it refers, with pointers (yes, pointers) to the actual data row. The columns included in an index are not "keys". They are just index columns.