Steganographic use of Arithmetic Coding with Stochastic Ordered CFG. Inspired by http://www.nic.funet.fi/pub/crypt/old/mimic/mimic.text ⚠️ experimental ⚠️.

fnordomat c497923054 update deps 2 years ago
benches f05e5f37c3 clippy 2 years ago
devtools 7db36c0ac3 apply power method to $A^p$ not $A$, where $p$ is the lcm of all possible periods of the transition matrix, to prevent (optional, features = "linalg") static runaway detection from being foxed by periodic, non-mixing behaviour 4 years ago
examples fe21021583 example grammar: variations of a scary bitcoin spam/scam mail. 4 years ago
src c497923054 update deps 2 years ago
.rgignore c8e5be87ca cleanup; some unit tests for the grammar file parser; introducing grammar file format v0.0.1 4 years ago
Cargo.lock c497923054 update deps 2 years ago
Cargo.toml c497923054 update deps 2 years ago
README.md 9f6bbb87d1 Updated dependencies 2 years ago

README.md

ChatBox

  • PROOF OF CONCEPT * PROOF OF CONCEPT * PROOF OF CONCEPT * PROOF OF CONCEPT *

Introduction

Steganographic use of Arithmetic Coding with Stochastic Ordered CFG.

Like this:

encode: 01101110100101 -> (()(())(())())

decode: (()(())(())()) -> 01101110100101

Similar to, and inspired by http://www.nic.funet.fi/pub/crypt/old/mimic/mimic.text

Beware: experimental. The crypto has been reviewed only once and should receive more attention. However, the methodology appears to be sound and the program should now be fast and robust enough for practical use.

Explainer

Choose an unambiguous context-free grammar. Add weights and order the productions for each nonterminal. For example, if S -> α | β for some α and β, we may choose S -> 1/4 α | 3/4 β.

Now to encode a binary string, think of it as the binary expansion r = 0,01110101... (say) of a number between 0 and 1.

Starting from the sentential form S (start symbol), go through the productions of the leftmost nonterminal (S), compare r to the cumulative weights of the rules. Because there are only two productions in the example, there would be only one comparison. Since r > 1/4, we proceed by replacing S with the string α and consider the first nonterminal of α and its weighted list of productions, scaling their cumulative weights to the range [1/4 .. 1].

This short explanation misses many important details, but that's the gist of it. We're working on a complete, mathematically rigorous description. It will appear in this repo at some point.

Features

Stochastic Ordered CFGs can be specified in a somewhat intuitive YAML format. See examples/ folder for some examples. The nonterminals are numbered, 0 is always the start symbol.

The ChatBox binary can validate grammars for well-formedness and several other important properties, but not for unambiguity. That's undecidable, unfortunately, and it's up to the user to verify.

The ChatBox binary can be given a grammar and a shared secret for cryptographic message encapsulation and run in --encode or --decode mode. One can then pipe messages into it and obtain sentential forms encoding the messages in --encode mode, or pipe sentential forms and obtain the original messages in --decode mode. If the grammar is prefixfree, you may use the --prefixfree option and just input the sentential forms (i.e. encoded messages that you receive from the encoder at the other end) one after the other, otherwise they must be externally delimited by always prefixing their lengths.

Build

install rust nightly (uses "experimental" feature: trait aliases)

cargo test --features linalg

try

cargo run -- --grammar examples/easy1.yaml --validate

cargo run -- --grammar examples/easy1.yaml --encode | cargo run -- --grammar examples/easy1.yaml --decode