Generates imaginary but plausible-sounding words

Luna McNulty d1cf772679 Fix typos in help 2 settimane fa
LICENSE ae600e5142 Add LICENSE 1 anno fa
README.md d1cf772679 Fix typos in help 2 settimane fa
markov-words.py d1cf772679 Fix typos in help 2 settimane fa

README.md

markov-words.py

Generates imaginary but plausible-sounding words:

$ markov-words.py --count 20 --no-apostrophes
elay
stuteria
savoils
mooning
Meling
Tarth
gulping
sloosts
boisoms
espee
circuitly
resgives
apped
posthance
action
Erid
Eisembinactions
taffy
ferabblier
pipe

Dictionaries and Languages

This works for any language where words are composed of more than a few characters:

$ markov-words.py --count 5 --dictionary-file /usr/share/dict/french
pilués
évassent
ouassentâmes
émentasses
diapées
$ wget https://github.com/danakt/russian-words/raw/master/russian.txt
$ markov-words.py --count 5 --dictionary-file russian.txt --encoding 'windows-1251'
безводилануласт
ком
полного
больной
потрическую

More languages
$ ./markov-words.py --count 5 --dictionary-file /usr/share/dict/spanish
desoxismo
ñapote
enemadriciar
moteo
cumuleja
$ markov-words.py --count 5 --dictionary-file /usr/share/dict/swedish --encoding 'latin1'
ser
framträngd
förholmars
upptäcknekännes
barna
$ markov-words.py --count 5 --dictionary-file /usr/share/dict/italian
andogliato
trasassico
renevarla
simormassero
ete

Dictionary files are just text files with each word separated by a line-break. They can be lists of anything, not just all words in a language.

$ wget https://raw.githubusercontent.com/dominictarr/random-name/refs/heads/master/first-names.txt
$ ./markov-words.py -d first-names.txt --count 5
Sala
Chiathanda
Amberl
Margie
Crisha
$ wget https://raw.githubusercontent.com/dominictarr/random-name/refs/heads/master/places.txt
$ ./markov-words.py -d places.txt --count 5
Mille
Iredonia
Loletown
Noke
Farwigsville

Randomness

The probability that a letter will occur in a word depends on the frequency with which it follows after the $n$ previous letters in dictionary words. As $n$ increases, the results go from "completely random" to "somewhat interesting" to "exactly copying the dictionary".

$ markov-words.py -n 0 --count 5 --no-apostrophes
dteoaiunsoaseer
i
tiinrtfa
pcosunuenicrrsn
io
$ markov-words.py -n 1 --count 5 --no-apostrophes
bletin
Cakeahiacyordix
qunatinenomatho
Cogrtern
sswefaty
$ markov-words.py -n 2 --count 5 --no-apostrophes
stawers
lizes
stelchotogithro
supper
horgerliacizes
$ markov-words.py -n 3 --count 5 --no-apostrophes
chua
Quation
alodhoolybug
mists
Dnient
$ markov-words.py -n 4 --count 5 --no-apostrophes
aments
pronolines
garness
unmodify
germainley
$ markov-words.py -n 5 --count 5 --no-apostrophes
lightlier
panning
Sui
succumbs
outbalance

Help

usage: markdov-words [-h] [-d DICTIONARY_FILE] [--no-apostrophes]
                     [--no-capitals] [--encoding ENCODING] [-c COUNT]
                     [-e END_BIAS] [-n N] [-l MAX_LENGTH]

options:
  -h, --help            show this help message and exit
  -d DICTIONARY_FILE, --dictionary-file DICTIONARY_FILE
                        Path to dictionary file -- A dictionary file is just
                        one containing a list of words separated by line-
                        breaks. On Unix systems these can usually be found in
                        /usr/share/dict/.
  --no-apostrophes      Exclude words with apostrophes from the dictionary
  --no-capitals         Exclude words starting with A-Z capital letters
  --encoding ENCODING   Number of words to print
  -c COUNT, --count COUNT
                        Number of words to print
  -e END_BIAS, --end-bias END_BIAS
                        Multiplier for the probability that a word will end
                        at a given point -- Note that sometimes the
                        probability is zero, so setting this very high does
                        not guarantee that words will not be abnormally long.
  -n N, --n N           The number of previous letters to take into account
                        in selecting the next one -- For high values of n,
                        the results are likely to exactly reproduce words in
                        the dictionary, whereas for lower values, they are
                        likely to sound implausible.
  -l MAX_LENGTH, --max-length MAX_LENGTH
                        Maximum length of a word, which if reached will
                        simply terminate the word, even if the ending is not
                        a probable one.