# Data.MemoCombinators and You

Part of the beauty of Haskell is that it allows you to simply write recursive functions. But part of the problem with recursive functions is that they tend to have absolutely horrible big-O run times. The usual solution to this problem is to use what’s known as memoization, which is memorization without the ‘r’, since programmers have to have special names for everything. Memoization is usually implemented as an associative array (or a plain array in the common case where the function takes a single non-negative integer as an argument); the function attempts to look up the return value for its arguments in an associative array. If it finds it, it can return without doing expensive computation; if it doesn’t, then it performs the computation, stores the result in its array, and then returns. In Python, a memoized Fibonacci function might be written as follows:

fib_cache = {}
def fib(n):
if (n < 2): return n
if n not in fib_cache: fib_cache[n] = fib(n-1) + fib(n-2)
return fib_cache[n]

The speed savings gained by this are enormous; on my test machine, fib(35) takes 15 seconds to compute without memoization, whereas fib(1000) computes almost instantly with memoization. In terms of big-O running times, I believe that the memoized version takes $O(n^2)$ time, whereas the unmemoized version takes $\Theta(\phi^n)$, which is interesting since the Fibonacci numbers themselves are $\Theta(\phi^n)$. In any case, the memoized version is clearly superior.

But how do you do this in a language such as Haskell? You can’t carry state between the various incarnations of the function, since that could potentially lead to the function’s values not solely depending on its arguments, violating referential transparency. You can’t carry the state around in a monad because then different calls to the function would each have separate caches, so you’d have to pull some kind of trick where the function returns itself and its value, then pass the function around and it would just be a huge mess. So instead what you do is you use Data.MemoCombinators, which is a package that lets you turn functions into other, memoized functions. So how do you use it? It’s not too hard, especially if you’re memoizing functions that only use builtin types. An example, straight from the Data.MemoCombinators page:

fib = Memo.integral fib'
where
fib' 0 = 0
fib' 1 = 1
fib' x = fib (x-1) + fib (x-2)

There are two things to note here: first, the memoized version, fib, is generated from the non-memoized by calling Memo.integral on it. This is how you create memoized versions of single-variable functions: you apply the appropriate combinator. Second, fib’ calls fib inside it. This is very important: if fib’ called fib’, then you couldn’t save time within the fib function, only outside of it. With fib’ calling fib, on the other hand, then the first time you call fib 1000, not only will it return before the heat death of the universe, but you’ll also get fib 999, fib 998, etc. cached.

But what if your function to be memoized isn’t one of the standard types? That’s why there’s Memo.wrap. You just have to define two mappings: one from your type to some combination of MemoCombinator types, and one that goes from that combination back. An example will make it clear:

data Foo = Foo String Int deriving (Show)
memoFoo = Memo.wrap toFoo fromFoo (Memo.pair memoString Memo.integral) where
memoString = Memo.list Memo.char
toFoo (a, b) = Foo a b
fromFoo (Foo a b) = (a, b)
f = memoFoo f' where
f' (Foo x y) = x

So as you can see, first you build up a memoString type which can memoize Strings; since a String is just a list of Chars, you can just apply Memo.list to Memo.char. Then you define toFoo and fromFoo, which send you from the abstracted Foo type to a tuple of a String and an Int. Finally, you use Memo.wrap to ‘wrap’ the pair of a memoized String and a memoized Int (constructed using Memo.pair, naturally) up in an abstract memoFoo memoizer.

The other thing you can do with MemoCombinators is memoize functions of multiple variables. Take this sample of code from a project I’m working on:

memoNimber = memoNimber = Memo.wrap toNimber fromNimber Memo.integral
(*) = Memo.memo2 memoNimber memoNimber (*!) where
x *! (Nimber 1) = x
(Nimber 1) *! x = x
a *! b = mex $liftM2 combine [0 .. pred a] [0 .. pred b] where mex xs = fromJust$ find (notElem xs) [0..]
combine a' b' = a * b' + a' * b + a' * b'

The actual definition of *! isn’t important, I’m only including it for completeness. Nor are the definitions of toNimber and fromNimber. What is important is Memo.memo2: you use it to generate a memoized function of multiple arguments. You just pass it memoizers for each of its arguments (since * takes two Nimbers, I pass it memoNimber twice) and the unmemoized version, and it gives you a memoized version.

As for how Data.MemoCombinators works, I can’t really explain that. I know it has to do with the fact that expressions in function definitions are cached, but beyond that my knowledge fails. Maybe if I ever learn it I’ll return to this and explain it.

Edit: After I wrote this I realized that Data.MemoTrie exists; while it has cleaner syntax for memoizing functions (the memoizer doesn’t need to know the types of the arguments), it has a disadvantage in that it’s not immediately obvious how to memoize the types it doesn’t give you. But if you’re just memoizing functions of Ints or something, go ahead and use MemoTrie.

## 2 ResponsesLeave one →

1. #### Luke Palmer

/  November 23, 2009

The problem with MemoTrie is the standard typeclass problem of one instance per type. So, for example, MemoTrie’s Integer memoizer is the same as integral from MemoCombinators, which is not the most efficient implementation (it just wraps it to a list of bits). A better implementation would, say, be a 4-bit patricia tree using fast bitshifts. You could happily write this as a new MemoCombinator of type Memo Integer, but MemoTrie already has a memoizer for Integer so you have to do some newtype madness to convince the type system that it is a different type when really it’s not.

So not only is it hard to memoize types it doesn’t give you (well, you declare new instances, but that risks orphans), it is hard to build new memo tables for types it *does* give you.

Note that I have author’s bias.