cdb-djb.txt 1.5 KB

123456789101112131415161718192021222324252627282930313233343536373839
  1. A structure for constant databases
  2. 19960914
  3. Copyright 1996
  4. D. J. Bernstein, djb@pobox.com
  5. A cdb is an associative array: it maps strings (``keys'') to strings
  6. (``data'').
  7. A cdb contains 256 pointers to linearly probed open hash tables. The
  8. hash tables contain pointers to (key,data) pairs. A cdb is stored in
  9. a single file on disk:
  10. +----------------+---------+-------+-------+-----+---------+
  11. | p0 p1 ... p255 | records | hash0 | hash1 | ... | hash255 |
  12. +----------------+---------+-------+-------+-----+---------+
  13. Each of the 256 initial pointers states a position and a length. The
  14. position is the starting byte position of the hash table. The length
  15. is the number of slots in the hash table.
  16. Records are stored sequentially, without special alignment. A record
  17. states a key length, a data length, the key, and the data.
  18. Each hash table slot states a hash value and a byte position. If the
  19. byte position is 0, the slot is empty. Otherwise, the slot points to
  20. a record whose key has that hash value.
  21. Positions, lengths, and hash values are 32-bit quantities, stored in
  22. little-endian form in 4 bytes. Thus a cdb must fit into 4 gigabytes.
  23. A record is located as follows. Compute the hash value of the key in
  24. the record. The hash value modulo 256 is the number of a hash table.
  25. The hash value divided by 256, modulo the length of that table, is a
  26. slot number. Probe that slot, the next higher slot, and so on, until
  27. you find the record or run into an empty slot.
  28. The cdb hash function is ``h = ((h << 5) + h) ^ c'', with a starting
  29. hash of 5381.