Generates imaginary but plausible-sounding words
Luna McNulty d1cf772679 Fix typos in help | 2 settimane fa | |
---|---|---|
LICENSE | 1 anno fa | |
README.md | 2 settimane fa | |
markov-words.py | 2 settimane fa |
Generates imaginary but plausible-sounding words:
$ markov-words.py --count 20 --no-apostrophes
elay
stuteria
savoils
mooning
Meling
Tarth
gulping
sloosts
boisoms
espee
circuitly
resgives
apped
posthance
action
Erid
Eisembinactions
taffy
ferabblier
pipe
This works for any language where words are composed of more than a few characters:
$ markov-words.py --count 5 --dictionary-file /usr/share/dict/french
pilués
évassent
ouassentâmes
émentasses
diapées
$ wget https://github.com/danakt/russian-words/raw/master/russian.txt
$ markov-words.py --count 5 --dictionary-file russian.txt --encoding 'windows-1251'
безводилануласт
ком
полного
больной
потрическую
More languages
$ ./markov-words.py --count 5 --dictionary-file /usr/share/dict/spanish
desoxismo
ñapote
enemadriciar
moteo
cumuleja
$ markov-words.py --count 5 --dictionary-file /usr/share/dict/swedish --encoding 'latin1'
ser
framträngd
förholmars
upptäcknekännes
barna
$ markov-words.py --count 5 --dictionary-file /usr/share/dict/italian
andogliato
trasassico
renevarla
simormassero
ete
Dictionary files are just text files with each word separated by a line-break. They can be lists of anything, not just all words in a language.
$ wget https://raw.githubusercontent.com/dominictarr/random-name/refs/heads/master/first-names.txt
$ ./markov-words.py -d first-names.txt --count 5
Sala
Chiathanda
Amberl
Margie
Crisha
$ wget https://raw.githubusercontent.com/dominictarr/random-name/refs/heads/master/places.txt
$ ./markov-words.py -d places.txt --count 5
Mille
Iredonia
Loletown
Noke
Farwigsville
The probability that a letter will occur in a word depends on the frequency with which it follows after the $n$ previous letters in dictionary words. As $n$ increases, the results go from "completely random" to "somewhat interesting" to "exactly copying the dictionary".
$ markov-words.py -n 0 --count 5 --no-apostrophes
dteoaiunsoaseer
i
tiinrtfa
pcosunuenicrrsn
io
$ markov-words.py -n 1 --count 5 --no-apostrophes
bletin
Cakeahiacyordix
qunatinenomatho
Cogrtern
sswefaty
$ markov-words.py -n 2 --count 5 --no-apostrophes
stawers
lizes
stelchotogithro
supper
horgerliacizes
$ markov-words.py -n 3 --count 5 --no-apostrophes
chua
Quation
alodhoolybug
mists
Dnient
$ markov-words.py -n 4 --count 5 --no-apostrophes
aments
pronolines
garness
unmodify
germainley
$ markov-words.py -n 5 --count 5 --no-apostrophes
lightlier
panning
Sui
succumbs
outbalance
usage: markdov-words [-h] [-d DICTIONARY_FILE] [--no-apostrophes]
[--no-capitals] [--encoding ENCODING] [-c COUNT]
[-e END_BIAS] [-n N] [-l MAX_LENGTH]
options:
-h, --help show this help message and exit
-d DICTIONARY_FILE, --dictionary-file DICTIONARY_FILE
Path to dictionary file -- A dictionary file is just
one containing a list of words separated by line-
breaks. On Unix systems these can usually be found in
/usr/share/dict/.
--no-apostrophes Exclude words with apostrophes from the dictionary
--no-capitals Exclude words starting with A-Z capital letters
--encoding ENCODING Number of words to print
-c COUNT, --count COUNT
Number of words to print
-e END_BIAS, --end-bias END_BIAS
Multiplier for the probability that a word will end
at a given point -- Note that sometimes the
probability is zero, so setting this very high does
not guarantee that words will not be abnormally long.
-n N, --n N The number of previous letters to take into account
in selecting the next one -- For high values of n,
the results are likely to exactly reproduce words in
the dictionary, whereas for lower values, they are
likely to sound implausible.
-l MAX_LENGTH, --max-length MAX_LENGTH
Maximum length of a word, which if reached will
simply terminate the word, even if the ending is not
a probable one.