Walkthrough and implementation of the μKanren paper

1430db5 `Trying out some type checking with errors`

~ashton314 pushed to ~ashton314/microKanren git

Mirrored at ashton314 on GitHub.

Main repository at ~ashton314 on SourceHut; issue tracker also on SourceHut here. Mailing list for this repository here.

I wanted to understand how μKanren works. This is an annotated journey through implementing the code from that paper.

μKanren is a very small implementation in the Kanren family: essentially these are little embedded Prolog implementations. μKanren is particularly interesting because its implementation is less than 40 lines of Scheme code, and makes no use of exotic language features. Indeed, if your language has closures, you can make yourself a μKanren.

```
state :: subst × fresh_var_counter
subst :: assoc list mapping variable → variable | value
goal :: state → state*
```

Goals take a state and return a stream (lazy) of zero or more new states.

A state tells us a variable substitution that satisfies the constrains the goals created.

`walk`

functionThis takes a variable and a substitution list, and it will walk
through the substitution list until it finds the ultimate reference of
the variable given. Since variables can map to other variables in the
substitution list (see the *Basic types* section) then
`walk`

traverses those transitive dependencies until it can't
any more.

It *can* return another variable; if the last thing that one
variable points to is another variable that is *not* present at
the beginning of the list, then returning that variable is valid. This
is important for the *unify* function.

`unify`

The unify function
take two *things*, `u`

and `v`

, and tries
to make them line up according to the substitution that you give as
well.

```
(unify '(1 2 3) '(2 3 4) '())
#f
(unify '(1 2 3) '(1 2 3) '())
'()
(unify '(1 2 3) '(1 2 3) '(yay))
'(yay)
(unify (list 1 2 3) (list 1 (var 0) 3) '())
(list (cons (var 0) 2))
(unify (list 1 2 3) (list 1 (var 0) 3) `((,(var 0) 4)))
#f
```

This example illustrates how the `walk`

function drills
down:

```
(walk (var 0) `((,(var 0) . ,(var 1)) (,(var 1) . ,(var 2))))
(var 2)
(unify (list 1 2 3) (list 1 (var 0) 3) `((,(var 0) . ,(var 1)) (,(var 1) . 2)))
(list (cons (var 0) (var 1)) (cons (var 1) 2))
```

If we wanted to be able to unify more than just lists (e.g. rich
structures) we would teach μKanren here in the `cond`

how to
walk those richer structures.

Successful unification returns the substitution list that made the two things unify. This is different from the passed-in substitution list when a variable is found to point to another variable.

`call/fresh`

The implementation of call/fresh depends on the
structure of the state: in a pure language, we stick a fresh variable
counter on the state so we can thread that fresh effect through the
computation. I would like to try just using something like
`gensym`

for variable creation.

`AND`

and
`OR`

goal constructorsThe magic of the disj and conj functions is
encapsulated in the `mplus`

and `bind`

functions.

With the `disj`

function, we want to `OR`

two
goals, and the `conj`

is to `AND`

two goals. For
`disj`

, we run both goals and *add* them together. For
`conj`

, we run the first goal (seen by applying the first
goal to the `subst/counter`

variable) and then we thread the
result of that to the second goal.

Exactly *how* we add the goal results together bzw. thread the
state from one goal to another determines the properties of the search.
I'll skip the detailed evolution of this (see the paper for a nice
walk-through) but in the end we get lazily-evaluated interleaving stream
handling so we can exhaust every finite stream.

Streams are lists of states with lazily-evaluated members.

Here's an example from the paper showing how streams need to be interleaved and be lazy:

```
(define (fives x)
(disj (== x 5)
(λ (s/c) (λ () ((fives x) s/c)))))
(define (sixes x)
(disj (== x 6)
(λ (s/c) (λ () ((sixes x) s/c)))))
(define fives-and-sixes (call/fresh (λ (x) (disj (fives x) (sixes x)))))
```

These are some syntactic sugar that make working with μKanren nicer. Most of them are macros, which would make porting these to other languages less straight-forward. But they do make working in Scheme/Racket a lot nicer. Some new non-Lisp languages like Elixir1 feature hygienic macro systems, so these features would be portable.

I deviated from the paper's implementation of variables and wrote them as structs instead of vectors. I think further changes could be made (e.g. not having to keep around a number in the state to generate fresh variable names but these might rely on some more language-specific features. (E.g. generating fresh strings/symbols.)

`unify`

I've added some rudimentary predicate checking to the
`unify`

function:

```
(define (fav-num n)
(disj (== n 42)
(== (cons '? even?) n)))
```

```
> (run* (n) (== n 12) (fav-num n))
'(12)
> (run* (n) (== n 13) (fav-num n))
'()
> (run* (n) (fav-num n))
'(42 (? . #<procedure:even?>))
```

The classic example. See ./relations_playground.rkt.
Because of how the relations are defined, this will print out an
infinite list of relations if you try to run certain queries, so best
use the `run`

function with some finite (and preferably small
number; it doesn't take much to cover the whole space at least once)
bound, as opposed to just running `run*`

.

See ./type_checking.rkt for an implementation of a simple type checker/inference algorithm. Here is how you check the type of a program:

```
> (run* (type) (type-for '((lambda x x) 2) '() type))
'(number)
> (run* (type) (type-for '((lambda x (zero? x)) 2) '() type))
'(boolean)
> (run* (type) (type-for '((lambda x (zero? x)) #f) '() type))
'() ;; type error
> (run* (type) (type-for '(lambda x x) '() type))
'((_.5 . _.5)) ;; generic type: a -> a
```

Here's the crazy thing: you can actually ask for programs that match
a given type, since relations work both ways. Here's an example of
generating five programs that are of type
`number → boolean`

:

```
> (run 5 (prog) (type-for prog '() (cons 'number 'boolean)))
'((lambda _.1 (? . #<procedure:boolean?>))
(lambda _.1 (zero? (? . #<procedure:number?>)))
(lambda (? . #<procedure:symbol?>) (zero? (? . #<procedure:symbol?>)))
(lambda _.1 (zero? (+ (? . #<procedure:number?>) (? . #<procedure:number?>))))
(lambda _.1 (if (? . #<procedure:boolean?>) (? . #<procedure:boolean?>) (? . #<procedure:boolean?>))))
```

I hope is *very clear* that *I* did *not* write
the μKanren paper. That would be Daniel P. Friedman and Jason Hemann. I
merely wrote up this annotation.

Ashton Wiersdorf <ashton.wiersdorf@pobox.com>

Be sure to read the actual paper which is freely available.

Other fun links:

Personally, I think of Elixir as a Lisp in Ruby's clothing running on the BEAM. But don't tell anyone that Lisp is quietly becoming the new hot thing in web development and some machine learning. 🤫↩︎