Data Modeling vs. Database Design

01 Dec 96 updated


(Unfinished raw material)

Null / Not Null

Consider the slight difference between these two statements:

create table PERSON ( NAME char(30) )
create table PERSON ( NAME char(30) not null)

All we have added is "not null". Yet this is the next giant step (after structure of the tables themselves) toward embedding business rules in data structure. (Probably C. J. Date would argue that the choice between "not null" and "null" is immaterial since, Date would say, a sound design avoids all nulls and so every column could be treated as "not null".)

What we have done by adding "not null" is to insist that the column NAME must always have a value. I know, that's obvious. But what is less obvious is that we often overlook some point in the lifecycle of PERSON when she doesn't have a name - perhaps at birth. So insisting that NAME is "not null" is substantially more constraining than merely saying that a PERSON may have a NAME - i.e., that "null" is allowed.

That's obvious too - until we've coded "not null" and the rule changes; or more properly, we discover the correct rules, evidenced by a PERSON without a NAME . How many places must we look to make that change of business rule (or repair of the system to reflect the true rule)?

To be fair, a good data modeling technique will place that constraint in only one location (say the PERSON table). But don't forget that the person repairing the system may have dozens-hundreds-thousands of details to track on this one repair. And he may not have knowledge of that elegant data model where NAME is defined as "null" in only one place.

Foreign Keys and Referential Integrity

As long as we are designing relational databases (and I think that will be well into the 21st century), the only means available to us to navigate amongst tables of data is by the reference of a foreign key to some key of another table. (As we all know, a foreign key is a set of one or more columns whose structure and values match a set of key columns in another table. Not all of us are quite as clear that the referenced set of columns need only be some key, not the "primary key". But that is a different subject.)

Since a foreign key consists of one set of columns, it follows that any row with a foreign key can reference only one row in the other, "foreign", table. Thus the maximum cardinality in the parent table is exactly one row referenced. And the minimum is exactly zero, unless someone or something tags "not null" on the column(s) of the foreign key.

On the other hand, nothing about the existence of a foreign key column set can enforce that there must be some row holding a particular foreign key value; that is, the child cannot do the parent's work if the parent requires some minimum number of children. And nothing about the existence of a foreign key value in one row prevents the same foreign key value(s) from occurring in any number of other rows. Thus it is stunningly simple that every foreign key reference in a relational database is (zero-or-)one-to-zero-or-many. A foreign key reference can be nothing else without some procedural code to embellish the constraint.

We often discuss referential integrity as if it is inherent in relational database implementation. But of course it is not. Just because we have provided foreign key column(s), even if we have marked the column(s) "not null", does not mean that the column(s) will hold valid values. In this case, "valid" means values which exist in the appropriate key of the referenced table.

To maintain referential integrity, any insert or update on the child table holding the foreign key must be responsible for providing a valid foreign key value set - i.e., one which currently exists in the referenced table. Any update to or deletion of that referenced row must be responsible for dealing with potentially orphaned child rows carrying the foreign key. We're not concerned here with the various options such as restrict, cascade, set null, or set default. What we care about is how any such options are implemented.

Most modern RDBMS products offer us at least two ways to enforce referential integrity: