title: How to Design Data description: > Notes on the How to design data recipe. categories: posts
Table of Contents
A data definition establishes the represent/interpret relationship between information and data. Hence, to identify the inherent structure of the information is extremely important.
The structure of the information determines the kind of data definition used. The data definition determines the structure of the templates and helps determine the function examples/tests. Which in turn structures much of the final program design.
Atomic data is information which, if disassemble, doesn't make sense. For instance, the name of a city, if disassemble into characters it doesn't provide new information about the city.
;; Time is Natural
;; interp. number of clock ticks since start of game
(define START-TIME 0)
(define OLD-TIME 1000)
;;; TEMPLATE
Two tests should suffice for simple atomic data. Additional tests are required
if there are multiple cases involved.
If the functions produces boolean
you should have, at least, one test per
boolean.
Intervals are used to represent information within a certain range. They often appear in itemizations, but can also appear alone.
;; Countdown is Integer[0, 10]
;; interp. the number of seconds remaining to liftoff
(define C1 10) ; start
(define C2 5) ; middle
(define C3 0) ; end
;;; TEMPLATE
Provide sufficient examples to illustrate how the type represents information. When writing tests for functions operating on intervals be sure to test closed boundaries as well as midpoints. As always, be sure to include enough tests to check all other points of variance in behavior across the interval.
Enumerations are useful when the information to be represented consists of a fixed number of distinct items, such as colors, letter grades etc. In the case of enumerations it is sometimes redundant to provide an interpretation and nearly always redundant to provide examples. The example below includes the interpretation but not the examples.
;; LightState is one of:
;; - "red"
;; - "yellow"
;; - "green"
;; interp. the color of a traffic light
;; <examples are redundant for enumerations>
;;; TEMPLATE
Functions operating on enumerations should have (at least) as many tests as there are cases in the enumeration. For big enumerations, though, it is not necessary to write out all the cases for such a data definition. Instead write one or two, as well as a comment saying what the others are, where they are defined etc.
Defer writing templates for such large enumerations until a template is needed for a specific function. At that point include the specific cases that function cares about. Be sure to include an else clause in the template to handle the other cases. The same is true of tests. All the specially handled cases must be tested, in addition one more test is required to check the else clause.
An itemization describes data comprised of 2 or more subclasses, at least one of which is not a distinct item.
;; Bird is one of:
;; - false
;; - Number
;; interp. false means no bird, number is x position of bird
(define B1 false)
(define B2 3)
;;; TEMPLATE
Itemizations should have enough data examples to clearly illustrate how the type represents information. Functions operating on itemizations should have at least as many tests as there are cases in the itemizations. If there are intervals in the itemization, then there should be tests at all points of variance in the interval. In the case of adjoining intervals it is critical to test the boundaries.
A common case is for the itemization to be comprised of 2 or more intervals. In this case functions operating on the data definition will usually need to be tested at all the boundaries of closed intervals and points between the boundaries.
;;; Reading is one of:
;; - Number[> 30]
;; - Number(5, 30]
;; - Number[0, 5]
;; interp. distance in centimeters from bumper to obstacle
;; Number[> 30] is considered "safe"
;; Number(5, 30] is considered "warning"
;; Number[0, 5] is considered "dangerous"
(define R1 40)
(define R2 .9)
;;; TEMPLATE