123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125 |
- <!DOCTYPE html><head><meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no" /><meta name="keywords" content="GNU, Emacs, Libre Software, Hurd, Guile, Guix" /><meta name="description" content="GNUcode.me is a website focusing on libre software projects, especially the GNU project." /><link type="application/atom+xml" rel="alternate" title="GNUcode.me -- Feed" href="/feed.xml" /><a rel="me" href="https://fosstodon.org/@thegnuguy"></a><link type="text/css" href="css/footer.min.css" rel="stylesheet"></link><link type="text/css" href="css/header.min.css" rel="stylesheet"></link><link type="text/css" href="css/main.min.css" rel="stylesheet"></link><title>Simple Mispelling Problem — GNUcode.me</title></head><body><header><nav><ul><li><a href="index.html">GNUcode.me</a></li><li><a href="services.html">Services</a></li><li><a href="about.html">About</a></li><li><a href="business-ideas.html">Business-ideas</a></li></ul></nav></header><h1>Simple Mispelling Problem</h1><main><section class="basic-section-padding"><article><h3>by Joshua Branson — November 22, 2022</h3><div><p>Edit: Yes I am aware that I misspelled "mispelling". I figure it's funny if I
- leave it as it is. :)</p><p>I had this simple coding problem that I wanted to solve. Here's the problem:</p><p>Suppose you are writing an guix service <a href="https://notabug.org/jbranso/guix/src/newOpensmtpdBranch/gnu/services/mail.scm">(like I happen to be)</a>, and you are
- sanitizing user input for various records. Suppose your user mispells an
- option. Wouldn't it be nice to include a nice helpful hint on what he probably
- did wrong?</p><pre><code class="language-scheme">(opensmtpd-option (option "forany"))</code></pre><p>error: (option "forany") is invalid.
- hint: Try "for rcpt-to", "for domain", "for local", "for any", or "for".</p><p>Using <code>string-prefix-length-ci</code>, I was able to construct a fairly naive
- prococedure that tries to guess what the user meant to type. Here's what I came
- up with:</p><pre><code class="language-scheme">;; if strings is (list "auth" "for any" "from local")
- ;; Then this will return "Try \"auth\", \"for any\", or \"from local\"."
- (define (try-string strings)
- (string-append "Try "
- (let loop ((strings strings))
- (cond ((= 1 (length strings))
- (string-append
- "or \"" (car strings) "\".\n"))
- (else
- (string-append
- "\"" (car strings) "\", "
- (loop (cdr strings))))))))
- ;; suppose string is "for anys"
- ;; and strings is (list "for any" "for local" "for domain")
- ;; then hint-string will return "Did you mean "for any"?"
- (define* (hint-string string strings
- #:key (fieldname #f))
- (if (not (string? string))
- (try-string strings)
- (let loop ((current-max 1)
- (loop-strings strings)
- (hint-strings '()))
- (if (null? loop-strings)
- (cond ((= 1 (length hint-strings)) ;; only one worthwhile match
- (if fieldname
- (string-append "Did you mean (" fieldname " \""
- (car hint-strings) "\") ?\n")
- (string-append "Did you mean \"" (car hint-strings)
- "\"?\n")))
- (else (if (null? hint-strings)
- (try-string strings)
- (try-string hint-strings))))
- (let* ((element-string (car loop-strings))
- (element-max
- (string-prefix-length-ci element-string string)))
- (cond ((> element-max current-max)
- (loop element-max (cdr loop-strings)
- (list element-string)))
- ((= element-max current-max)
- (loop current-max (cdr loop-strings)
- (cons element-string hint-strings)))
- (else (loop current-max
- (cdr loop-strings) hint-strings))))))))</code></pre><p>It won't recognize that "or any" or "bor any" should match "for any", but for
- most mispellings, it should be half decent, provided the user got the first
- character right.</p><p>What do you all think? How would you write such a procedure?</p><p>EDIT: Well it turns out that the guix developers actually have a
- (string-closest) procedure. The relevant code can be found in
- (guix utils) and (guix combinators):</p><pre><code class="language-scheme">(define fold2
- (case-lambda
- ((proc seed1 seed2 lst)
- "Like `fold', but with a single list and two seeds."
- (let loop ((result1 seed1)
- (result2 seed2)
- (lst lst))
- (if (null? lst)
- (values result1 result2)
- (call-with-values
- (lambda () (proc (car lst) result1 result2))
- (lambda (result1 result2)
- (loop result1 result2 (cdr lst)))))))
- ((proc seed1 seed2 lst1 lst2)
- "Like `fold', but with two lists and two seeds."
- (let loop ((result1 seed1)
- (result2 seed2)
- (lst1 lst1)
- (lst2 lst2))
- (if (or (null? lst1) (null? lst2))
- (values result1 result2)
- (call-with-values
- (lambda () (proc (car lst1) (car lst2) result1 result2))
- (lambda (result1 result2)
- (loop result1 result2 (cdr lst1) (cdr lst2)))))))))
- (define (string-distance s1 s2)
- "Compute the Levenshtein distance between two strings."
- ;; Naive implemenation
- (define loop
- (mlambda (as bt)
- (match as
- (() (length bt))
- ((a s ...)
- (match bt
- (() (length as))
- ((b t ...)
- (if (char=? a b)
- (loop s t)
- (1+ (min
- (loop as t)
- (loop s bt)
- (loop s t))))))))))
- (let ((c1 (string->list s1))
- (c2 (string->list s2)))
- (loop c1 c2)))
- (define* (string-closest trial tests #:key (threshold 3))
- "Return the string from TESTS that is the closest from the TRIAL,
- according to 'string-distance'. If the TESTS are too far from TRIAL,
- according to THRESHOLD, then #f is returned."
- (identity ;discard second return value
- (fold2 (lambda (test closest minimal)
- (let ((dist (string-distance trial test)))
- (if (and (< dist minimal) (< dist threshold))
- (values test dist)
- (values closest minimal))))
- #f +inf.0
- tests)))</code></pre><p>A lot of the above code is a little bit above my head, but it sure looks cool.</p><p>And it actually works better than mine.:</p><pre><code class="language-scheme">;; old scheme code
- (display (hint-string "bor any" (list "for any" "auth" "rdns")))
- Try "for any", "auth", or "rdns".
- ;; It didn't match any string. :(
- ;; Let's try guix's (string-closest _) ...
- (string-closest "bor any" (list "for any" "auth" "rdns"))
- $1 = "for any"</code></pre><p>Awesome!</p></div></article></section></main><footer><p>© 2020 Joshua Branson. The text on this site is free culture under the Creative Commons Attribution Share-Alike 4.0 International license.</p><p>This website is build with Haunt, a static site generator written in Guile Scheme. Source code is <a href="https://notabug.org/jbranso/gnucode.me">available.</a></p><p>The color theme of this website is based off of the famous <a href="#3f3f3f" target="_blank">zenburn</a> theme.</p></footer></body>
|