rock_carved_binary_data.md 4.6 KB

Rock Carved Binary Data

WORK IN PROGRESS

version 0.1

by drummyfish, released under CC0 1.0, public domain

Rock Carved Binary Data (RCBD) is a writing format meant for simple recording of small sized binary data on physical media such as paper, wood, stone or plastic, possibly for communication or backups.

Why does this exist? Let's ask ourselves -- as a civilization based on information processing, what information will we leave behind? Compared to old civilizations probably almost none. Data stored with today's overcomplicated computer technology will sooner or later disappear, that's a fact. Data stored on paper is better, but still the commonly used paper probably won't last very long, everything is made as cheaply as possible without ever considering long time durability. While we sacrifice a lot of memory capacity by choosing to carve data in stone, many times a very important information is extremely small, such as a date of an event, number of inhabitants, name of a man, mathematical equation, a short poem etc.

Details

The format aims especially for the following:

  • Simple manual carving, i.e. data characters have to be simple, consisting only of straight lines and as few of them as possible.
  • Being somewhat robust, i.e. characters shouldn't contain details that could, along with the information they bear, easily be distorted by weathering.
  • Being also robust for automatic OCR, i.e. there shouldn't be too similar characters so that it's quite easy to automatically extract the data back e.g. from a photo of the carvings.
  • Being human readable, i.e. not requiring precise measuring instruments or computers.

Specification

The data is recorded on a data sheet as a sequence of characters, all of the same size, none of which is blank, which are read left to right and top to bottom.

The first character is the marker. It is different from the proper data and serves several purposes, mainly:

  • Hint on the fact that RCBD is used so that when someone sees the data, he knows how to decode it. For this the character should be a bit unique.
  • Show the data sheet orientation so that when the sheet is e.g. rotated or even mirrored, the original orientation can still be recovered. This is important for the correct order of data characters as well as their correct decoding. For this the character mustn't be symmetric in any common way.
  • Possibly optionally hint on the specific encoding of the data, i.e. say what the recorded bits actually mean.
  • Possibly mark the data as one unit, a "file" separate from other possible ones.

The marker character in its base form looks like this:

 _____
 _____
|
|
|

In this form the following encoded data is taken to be a further unspecified stream of bits. Adding one more line to the marker like follows hints on the fact that the first character of the encoded data will be a magic number (specified further) giving further details about the data format.

 _____
 _____
|\_
|  \_
|    \

If this is the case, the first characters (5 bits) of encoded data will further specify the nature of the data that will follow like this:

  • If the highest (5th) bit is set, the format used is the one of (depending on the data type) the special RCBD formats specified below.
  • Lowest 2 bits specify the type of data like so:
    • 00: text
    • 01: image
    • 10: audio
    • 11: video
  • Rest of the bits are reserved and should now be set to 0.

After this the data characters follow. Each data character encodes exactly 5 bits (so 8 characters encode exactly 5 bytes).

The base of the character is a diagonal line that's always present and expresses the lease significant bit by its orientation like this:

\_              _/
  \_          _/
    \_      _/
    
  0            1

Further bits are added by horizontal and vertical line, each one recording the bit by its presence (1) or absence (0). Here are bits A (second least significant), B, C and D:

      A
   _______
  |       |
B |       | C
  |       |
  |_______|

      D

So for example the following:

     _/|
   _/  |
 _/____|

Records a value 11001.

TODO:

  • Software for working with RCBD (convert text to image etc.)
  • Simple representation of the above with ASCII characters, e.g. something like \|?
  • ...

RCBD Specific Formats

Text

TODO: 5bits per character, just lowercase alphabet plus some special chars...

Image

TODO: will likely start with 10 bit resolution, then following pixel format specifier (possible e.g. 1b per pixel, 5bit grayscale, RGB221, also maybe indexed?), then raw data

Audio

TODO

Video

TODO