2018-12-14-information-system-db-design.md 9.7 KB


title: Information System DB Design description: > Things to keep in mind when designing a database for an information system categories: posts

tags: [database]

Table of Contents
  • TOC {:toc}

Requirements analysis

Although the focus of this article is DB design here are some things to consider from the requirements:

  • Purpose of the information system.
  • Features based on needs, never on assumptions.
    • "must be able to..."
  • Go from open to close questions.
  • Describe user roles, and privileges.

Data Modeling

Identify

  • required data
    • draft/review mandatory input forms, and reports
  • problem domain concepts

Model

  • organize concepts as sets of attributes
  • an instance of a concept is
    • a tuple of attribute values
    • visualized as a row.
  • fields collected according to a tuple of attributes
    • make a relation
    • are generally visualized as tables
    • are shown as columns, for single attributes

Table design

A table may stand for

  • a concept with definite attributes. eg. name -> 1st, middle, last
  • an attribute with indefinite descriptors. eg. phone -> work, mobile, etc

Also, consider

  • extracting redundant attributes into a new table
  • using associations for composite concepts

Test designs

From the first schema implementation we should test our table, and associations designs using sample/fake data.

  • check all operations work as expected
  • define associations
  • improve ease of operation

We should iterate over table, and association designs to avoid modifying their attributes once in production.

Relevant Sequel pages:

Also, since test data hardly ever produces the best optimizations for real data we should avoid creating indexes, and the like, during testing.

Associations

Like DB tables, associations also stand for concepts in the problem domain. They are used to simplify the code that deals with related tuples in separate DB tables.

We can only associate DB tables which share key attributes.

Keys

We enforce uniqueness by assigning an attribute as the key. Whenever we need to use more than one attribute to differentiate each tuple we call it a composite key.

Primary key

A table's unique identifier is called primary key. To assign a primary key to a table we need to identify (in order)

  • super keys. all combination of attributes that uniquely id a record.
  • candidate keys. super keys which are
    • irreducible
    • always unique
    • never NULL.
    • immutable.
  • primary key. The candidate key with least attributes.
    • should not expose sensitive data.
    • prefer single attribute keys over composite ones.
    • artificial/surrogate key. unique auto-generated random id.
    • natural/business key. From among existing attributes.
Foreign key

On every association, one of the tables has to hold the primary key of the other. Although is meant to connect tuples in different tables it's called foreign key.

To avoid orphaned tuples across tables, whenever we delete a tuple from any of the tables in an association we must either:

  • Disallow changes.
  • Update the tables in the relationship.
  • Hide, and flag connected tuples as deleted
  • Delete connected tuples

Many-to-One

Two tables are in a many-to-one (*-1) association when zero, one, or more tuples in one table are associated to a single tuple in a different table.

| albums     |               | artists |
|------------|               |---------|
| :id        |       /---1-  | :id     |
| :artist_id | -*---/        | :name   |
| :name      |

Using Ruby's Sequel toolkit

class Album < Sequel::Model
  many_to_one :artist
end

One-to-Many

Two tables are in a one-to-many (1-*) association when every tuple in one table is associated with zero, one, or more tuples in another table.

| artists |              | albums     |
|---------|              |------------|
| :id     | -1---\       | :id        |
| :name   |       \--*-  | :artist_id |
                         | :name      |

This association is complimentary to many-to-one. Using Sequel,

class Artist < Sequel::Model
  one_to_many :albums
end
One-to-One

A one-to-one (1—1) association may be necessary when we have to split a table into two or more tables for security or performance reasons.

|  users  |              |  phrases |
|---------|              |----------|
| :id     | -1---\       | :phrase  |
| :email  |       \--1-  | :user_id |

This is considered a subset of the one-to-many association.

class Artist < Sequel::Model
  one_to_one :album
end

Many to Many

A many-to-many association (*-*) allows each tuple in a table to be associated to many tuples in an associated table, and vice versa.

| albums     |               | artists |
|------------|               |---------|
| :id        |       /---*-  | :id     |
| :artist_id | -*---/        | :name   |
| :name      |

To satisfy both conditions, this association is broken into two one-to-many associations using a joined table to associate them.

| albums |          | albums_artists |          | artists |
|--------|          |----------------|          |---------|
| :id    | -1-\     | :album_id      |     /-1- | :id     |
| :name  |     \-*- | :artist_id     | -*-/     | :name   |

Using Sequel, and assuming the albums_artist table exists

class Artist < Sequel::Model
  many_to_many :albums
end
One-through-one

The one-through-one association sets up a one-to-one association through a joined table.

The difference between the one-through-one, and many-to-many association is that in the former there can only be 0 or 1 records in the associated table.

This association is useful for

  • enforcing unique references, in the join table, to one of the tables.
  • ordering the association, and only return the first record.

Using Sequel

class Artist < Sequel::Model
  one_through_one :album
end

Normalization forms

Normalization is a way to organize data. There are various normal forms, each dependent on the previous one.

1st Form

  • every attribute field can only hold a single a value
  • eliminate multiple values per field
  • assign a primary key
    • combination of fields/attributes that uniquely id a tuple/record
  • extract the attributes whose field values cause inner tables

It's usually easier to apply the 1st form in this order

  1. eliminate multi-values
  2. eliminate repeating groups
  3. assign primary key

2nd normal form

All attributes must be related to the primary key. Extract any attribute which doesn't refer to the primary key, into it's own table.

3rd normal form

Remove transitive dependencies:

  • Ensure none of the non-PK attributes depends on a non-PK attribute.
  • Remove attribute values based on other attributes. Instead, generate them dynamically.

More normalization

There are other four normal forms. Beware, the more we normalize, the more we decompose our data into smaller tables. Which could make it difficult to write queries due to multiple Join.

  • Boyce-Codd
    • extract all non-key attributes that don't depend on each candidate key
  • Symmetric Constraint (4NF)
    • ensure all attributes of equal importance are part of the CK
  • Projection-Join (5NF)
    • Break the CK into associations
  • Irreducible Relations (6NF)
    • Break relations into the smallest set of attributes
    • Mostly used for interval attributes

Although we make reference to a composite key in forms 4 and 5, at that point those should be the only attributes in the relation.

Data Integrity

Some of the techniques we can use for data integrity are:

  • Define each attribute's set of permitted values
  • Constraint attributes types
  • Trigger a chain of mandatory procedures in response to an event.

Sequel relevant documentation:

Transactions

A transaction is the execution of several operations considered a unit of work. A transaction is consider reliable when it has these properties

  • Atomicity. All operations must succeed or nothing changes.
  • Consistency. Data moves from one valid state to another.
  • Isolation. DB state after executing a transaction concurrently or sequentially is the same.
  • Durability. Changes should be persistent.

Sequel relevant links:

Advanced features

Once the system has been in production for a while we should analyse DB statistics to improve the DB performance. Consider using features such as

  • index
  • functions
  • views
  • prepared statements
  • windows

Keep in mind that some DBMS, such as postgres, provide details of their implementations so we can use the best one according to our needs.

Sequel relevant links