[Mini Kanren] Benchmarking different substitution data structures

In logic programming, a substitution is a mapping from logic variables to values (including other logic variables). Logic programs are composed of goals, with the type signature Subst => Subst* — i.e. it might fail (thus generating 0 possible substitution), succeed once, or succeed multiple times, or even an infinite number of times.

It is necessary for substitutions to behave as values — if you take a substitution, and add a binding, you better get a new substitution, rather than do the modification in place, because that substitution might be used by other goals. In Lisp, the traditional data structure for this is the association list: a list of lists, where the cars (first elements) of each inner list are the keys. Clojure takes Phil Bagwell’s Hash Array Mapped Trie and adds persistency on top of it, to the effect that you have a copy-on-write trie map, with logarithmic rather than linear lookup times.

Technorati Tags: , , , , , , ,

Using association lists, my Scala code was performing about 3x slower than Petite Chez Scheme. Horrendous results — interestingly, using linked case objects optimized for storing keys and values reduce the gap, to the point that it was only about 10-15% slower. Still not great, but acceptable. Since my Mini Kanren implementation is written in Scala, not Clojure, I tried using Scala’s own immutable maps, reasoning that with Bagwell tries, the boost in look-up performance (O(log n) vs O(n)) ought to outweigh the increased cost of extending (O(log n) vs O(1)). Alas, Scala’s maps are not Bagwell tries. One simply gets an OOM error; I’d have to dig deeper to find out exactly why.

The solution, of course, is to use Clojure’s maps in Scala. Doable, after patching its Java interface a bit — Clojure being a Lisp, there is a penchant for short names, and therefore an IMapEntry has a getter called val(), instead of value() — but val is a keyword in Scala (denoting values, i.e. names, like variables but immutable). Patched code lives in a branch on my Clojure fork; hopefully this can get merged in as a stop-gap until Scala gets its own tries.

And with Clojure maps? Scala Kanren is now 2.5 times *faster* than Mini Kanren on Petite Chez. Mind-boggling. Oh, and you’d want the raw numbers, naturally. Bear in mind that this is used on a desktop system that is running Firefox, etc. at the same time. I’d need to do longer runs (e.g. 100x) from runlevel S to get better results.

Next speed-up is probably going to be obtained by reimplementing this back in Clojure — it has a built-in parallel map that works on lazy sequences, unlike Scala.

Strategy 1 2 3 avg rel
association list (Scheme) 39312 38853 39207 39124.0 1.0
association list (Scala) 114562 111629 109939 112043.3. 2.86x
linked triples (Scala) 47833 44813 44277 45641.0 1.167
Immutable maps (Scala) OOM
Clojure persistent maps (Scala) 17955 15586 13482 15674.3. 0.40

2 responses to “[Mini Kanren] Benchmarking different substitution data structures

    • Yup, thanks for that — found that out on the Scala mailing lists. The workaround should go away once Scala 2.8 comes out — provided the new persistent map implementation benchmarks well.