Browsing the archives for the Programming category

Why I Love Currying

So I’ve been playing around with Haskell a lot lately and using it for various random stuff; I haven’t progressed to the point where it’s my go-to language for random programs (I still use Python for that), but I at least have an idea of how to use it. And there’s one feature of Haskell that I miss sorely when I write code in Python, or pretty much any other vaguely functional language: currying.

In Haskell, every function takes a single argument. A function of multiple arguments, such as map, which applies a function to every element in a list, actually only has one argument; for example, map can be interpreted either as taking a function and a list and returning a list, or as taking a function and returning a function that takes a list and returns a list. More formally, in Haskell, these two type declarations are equivalent:

map :: (a -> b) -> [a] -> [b]
map :: (a -> b) -> ([a] -> [b])

This process, of taking a multi-argument function and converting it into a series of single-argument functions is known as currying, after the mathematician Haskell Curry (who, obviously, is also the source of the name Haskell); the process of partially applying arguments to a function in this way is known as ‘partial application’, but is also called currying. One of the most obvious examples of currying is in sections: the function (0 ==) is syntactic sugar for (==) 0, and returns whether its argument is equal to zero. Furthermore, we can also partially apply the predicate to filter, to make a function that filters its argument on a fixed predicate. So, these three examples are completely equivalent:

removeZeros :: [Integer] -> [Integer]
removeZeros xs = filter (\x -> x /= 0) xs
removeZeros xs = filter (/= 0) xs
removeZeros = filter (/= 0)

(where /= is Haskell’s not-equal operator). The first is the most explicitly-written version, using no currying at all. The second curries the predicate; (/= 0) x is the same as x /= 0. Finally, since removeZeroes applied to an argument is the same as applying filter (/= 0) to it, we might as well define the former as the latter. Or, to take another example, look at the sortBy: it has type (a -> a -> Ordering) -> [a] -> [a], where Ordering is a datatype that can either be EQ, LT or GT for equal, less than, or greater than. So if you have some custom function you want to sort a list on, you can just say mySort = sortBy f and it will be the same as writing mySort xs = sortBy f xs, only cleaner and neater. Or in my Data.Nimber module (specifically lines 38, 39, and 43), many operations on Nimbers that’re required in order for me to call then ‘numbers’ are just the identity operation. So instead of saying abs x = x, I can just say abs = id.

Furthermore, without currying, you couldn’t have variadic functions; in order to work inside Haskell’s type system, the two types a -> b -> c and a -> (b -> c) have to be the same type. The full explanation involves typeclasses, and is (in my opinion) worth a read, because it’s a good explanation of a pretty horriblexcellent (it’s both at once, you see) type system hack.

As an aside, this also means that id :: a -> a, the identity function, is in a sense the same thing as ($) :: (a -> b) -> a -> b, which is function application. You can see this by substituting (b -> c) for a in the type of id, then removing parentheses:

id :: a -> a
id :: (b -> c) -> (b -> c)
id :: (b -> c) -> b -> c

So, in particular, f `id` x is the same as f $ x, which is just f x. Another way to think of this is that f `id` x = id f x = (id f) x = f x.

8 Comments

Variadic Functions in Haskell

Most modern languages have some kind of printf analogue: a function that takes a format string, and a series of things to be inserted into that string, and formats them all accordingly. At first glance, Haskell’s strong type system would seem to preclude this. There’s no built-in system for writing functions that take variable numbers of arguments, and it seems like it would be difficult to write one. The standard approach is to take a list instead, but this fundamentally doesn’t work for printf, since you’re going to be wanting to print Integers, Strings, and Floats. It’s possible to just pre-apply show to everything, but that’s not really a good idea, because you might want to show them in a different way than the built-in show does. You can use an extension called existential types to create a list of PrintfWrappers which wrap integers/floats/strings (more on that below), but that requires your users to manually do the wrapping, which is, once again, not a good idea. Haskell’s Text.Printf module takes a third approach. Look at the following lines:

instance (IsChar c) => PrintfType [c]
instance PrintfType (IO a)
instance (PrintfArg a, PrintfType r) => PrintfType (a -> r)

instance PrintfArg Integer
instance (IsChar c) => PrintfArg [c]

printf :: (PrintfType r) => String -> r

Here’s how to interpret this: PrintfType is the type of things that can be printed to. Printing to a String just gives you a string, much like sprintf in C or Perl, printing to an IO () will actually print it out (so you can use it like a normal printf in do blocks, a behavior which I personally find distasteful.). However, printf will return undefined when asked to return an IO r; the reason that you can nevertheless return one is that only declaring IO () as an instance of PrintfType is invalid according to Haskell 98.

PrintfArg, by comparison, are the elements that are valid arguments to printf; they basically consist of the various WordN/IntN types, Integer, Float, Char, and (IsChar c) => [c]. The point of the last instance is that, while you can’t have a specific version of a polymorphic type be an instance of a typeclass, you can restrict it to types whose parameters are themselves instance of another typeclass; the only instance of isChar is Char.
So now that we have that clarified, let’s suppose we want to call printf with “%s %d %f” “foo” 42 3.1, passing it the format string, String, an Integer, and a Float. This causes printf’s type to become

printf :: String -> String -> Integer -> Float -> String

Does this match the pattern (PrintfType r) => String -> r? Let’s go in reverse. String is an instance of PrintfType, and Float is an instance of PrintfArg, so Float -> String is an instance of PrintfType. Therefore, Integer -> (Float -> String) is an instance of PrintfArg, and so is String -> (Integer -> (Float -> String)). Dropping parentheses, this becomes String -> Integer -> Float -> String. So the types all check out. If you pass an invalid type, then you’ll run into something that isn’t an instance of PrintfArg and so the types won’t check.

I mentioned above that if you use something called ‘existential types’, you can do something similar. The way it works is that you define a new type whose data constructor only requires that its argument be of a given typeclass. Look at the following example

{-# LANGUAGE ExistentialQuantifiers #-}
data Box = forall s. (Show s) => Box s

boxes = [Box 2, Box "f", Box [8,3]]\

showBoxes :: [Box] -> String
showBoxes [] = ""
showBoxes ((Box x):xs) = show x ++ " " ++ showBoxes xs

When you run showBoxes boxes, you get 2 "f" 83, exactly as you’d expect. Note that, however, the function unbox (Box x) = x cannot be written; it would have to be of type (Show s) => Box -> s, and there’s just no real way to do that. So once you’ve wrapped something up in a Box, you can only get at it by showing it. From this, you can see how to pass a heterogeneous list to printf. The reason that this approach is suboptimal is that it would require Text.Printf to export a Printf data constructor which would wrap up everything to make it of the appropriate type, and that would be rather annoying, especially since it relies on show preserving enough information for you to format the number after reading it back in.

This pattern can obviously extended to any other variadic, heterogeneous function, as long as you can define a suitable typeclass that its arguments must all be instances of. And that’s not really a restriction at all; if you can’t specify a behavior that the instances must have, then you don’t really know what you can do with the arguments, and so you can’t do anything at all!

5 Comments

Data.MemoCombinators and You

Part of the beauty of Haskell is that it allows you to simply write recursive functions. But part of the problem with recursive functions is that they tend to have absolutely horrible big-O run times. The usual solution to this problem is to use what’s known as memoization, which is memorization without the ‘r’, since programmers have to have special names for everything. Memoization is usually implemented as an associative array (or a plain array in the common case where the function takes a single non-negative integer as an argument); the function attempts to look up the return value for its arguments in an associative array. If it finds it, it can return without doing expensive computation; if it doesn’t, then it performs the computation, stores the result in its array, and then returns. In Python, a memoized Fibonacci function might be written as follows:

fib_cache = {}
def fib(n):
    if (n < 2): return n
    if n not in fib_cache: fib_cache[n] = fib(n-1) + fib(n-2)
    return fib_cache[n]

The speed savings gained by this are enormous; on my test machine, fib(35) takes 15 seconds to compute without memoization, whereas fib(1000) computes almost instantly with memoization. In terms of big-O running times, I believe that the memoized version takes O(n^2) time, whereas the unmemoized version takes \Theta(\phi^n), which is interesting since the Fibonacci numbers themselves are \Theta(\phi^n). In any case, the memoized version is clearly superior.

But how do you do this in a language such as Haskell? You can’t carry state between the various incarnations of the function, since that could potentially lead to the function’s values not solely depending on its arguments, violating referential transparency. You can’t carry the state around in a monad because then different calls to the function would each have separate caches, so you’d have to pull some kind of trick where the function returns itself and its value, then pass the function around and it would just be a huge mess. So instead what you do is you use Data.MemoCombinators, which is a package that lets you turn functions into other, memoized functions. So how do you use it? It’s not too hard, especially if you’re memoizing functions that only use builtin types. An example, straight from the Data.MemoCombinators page:

fib = Memo.integral fib'
    where
    fib' 0 = 0
    fib' 1 = 1
    fib' x = fib (x-1) + fib (x-2)

There are two things to note here: first, the memoized version, fib, is generated from the non-memoized by calling Memo.integral on it. This is how you create memoized versions of single-variable functions: you apply the appropriate combinator. Second, fib’ calls fib inside it. This is very important: if fib’ called fib’, then you couldn’t save time within the fib function, only outside of it. With fib’ calling fib, on the other hand, then the first time you call fib 1000, not only will it return before the heat death of the universe, but you’ll also get fib 999, fib 998, etc. cached.

But what if your function to be memoized isn’t one of the standard types? That’s why there’s Memo.wrap. You just have to define two mappings: one from your type to some combination of MemoCombinator types, and one that goes from that combination back. An example will make it clear:

data Foo = Foo String Int deriving (Show)
memoFoo = Memo.wrap toFoo fromFoo (Memo.pair memoString Memo.integral) where
    memoString = Memo.list Memo.char
    toFoo (a, b) = Foo a b
    fromFoo (Foo a b) = (a, b)
f = memoFoo f' where
    f' (Foo x y) = x

So as you can see, first you build up a memoString type which can memoize Strings; since a String is just a list of Chars, you can just apply Memo.list to Memo.char. Then you define toFoo and fromFoo, which send you from the abstracted Foo type to a tuple of a String and an Int. Finally, you use Memo.wrap to ‘wrap’ the pair of a memoized String and a memoized Int (constructed using Memo.pair, naturally) up in an abstract memoFoo memoizer.

The other thing you can do with MemoCombinators is memoize functions of multiple variables. Take this sample of code from a project I’m working on:

memoNimber = memoNimber = Memo.wrap toNimber fromNimber Memo.integral
(*) = Memo.memo2 memoNimber memoNimber (*!) where
      x *! (Nimber 1) = x
      (Nimber 1) *! x = x
      a *! b = mex $ liftM2 combine [0 .. pred a] [0 .. pred b]
        where
          mex xs = fromJust $ find (`notElem` xs) [0..]
          combine a' b' = a * b' + a' * b + a' * b'

The actual definition of *! isn’t important, I’m only including it for completeness. Nor are the definitions of toNimber and fromNimber. What is important is Memo.memo2: you use it to generate a memoized function of multiple arguments. You just pass it memoizers for each of its arguments (since * takes two Nimbers, I pass it memoNimber twice) and the unmemoized version, and it gives you a memoized version.

As for how Data.MemoCombinators works, I can’t really explain that. I know it has to do with the fact that expressions in function definitions are cached, but beyond that my knowledge fails. Maybe if I ever learn it I’ll return to this and explain it.

Edit: After I wrote this I realized that Data.MemoTrie exists; while it has cleaner syntax for memoizing functions (the memoizer doesn’t need to know the types of the arguments), it has a disadvantage in that it’s not immediately obvious how to memoize the types it doesn’t give you. But if you’re just memoizing functions of Ints or something, go ahead and use MemoTrie.

2 Comments

Gnawwy 0.0.2 out!

And you said I couldn’t do it. Actually, so did I. I managed to motivate myself enough to get off of my ass and release version 0.0.2 of gnawwy, my little program I wrote to notify Linux users (specifically, Ubuntu users) of updates to Twitter and such via notify-osd. The big changes in v0.0.2 are e-mail account support, support for multiple accounts (Twitter and e-mail), and the movement of password information and such into a .gnawwyrc. The next version I plan on adding some actual error checking and debugging information, so I plan on calling it the actual v0.1.0 release. Now that I’ve motivated myself to get to work on this, I should actually have that out within a week or so. Still yet to come at some unspecified time in the future in no particular priority:

  • Notification methods other than libnotify (perhaps even Windows support!)
  • RSS feeds
  • GUI configuration
  • A tray icon to notify you of unread messages
  • Actual comments in the code
  • A shiny icon
0 Comments

Conky with BOINC and Banshee

Conky is a great tool for monitoring CPU, memory, and network usage. It also has now playing integration for MPD, XMMS, BMPx, and Audacious. But it completely lacks integration with my music player of choice, Banshee, and it also has no built-in way to check my progress in BOINC, a distributed computing project. On the other hand, which Conky does have is built-in support for executing arbitrary programs and parsing the result as markup/text. So I wrote a couple quick Python scripts, one to get now-playing data from Banshee and put it in a nice format, and the other to grab progress data from BOINC and calculate percent completion and estimated time until completion.

Conky with Banshee and BOINC scripts

Conky with Banshee and BOINC scripts

To use the Banshee script, you need to start banshee-conky-daemon.py on startup (it echoes the data to a file in /tmp to avoid constantly invoking the script and causing noticeable skipping in Banshee). In Ubuntu 9.04, you can do this by going to System->Preferences->Startup Applications and adding a new entry for ‘python ~/banshee-conky-daemon.py’ (assuming you put it there). Then add ${execp cat /tmp/banshee-conky-daemon} to your .conkyrc wherever you want the output in conky. If you keep conky at a different width, you might need to mess around with the trimming numbers on line 33.

The BOINC script is simpler; just add ${execpi boinc-conky.py} to your .conkyrc. Red tasks are actively being worked on, green ones aren’t. Only tasks that have been started are shown. The dict maps project URLs to short names; if you find the program crashing with KeyErrors, just add an entry to the dict that maps the missing key to a description eight letters or less (the restriction is to maintain proper time estimate alignment; it can be changed).

If you have questions, leave a comment. Apologies for the lack of comments in code; I’ll answer any questions you may have. I promise to code with comments from now on.

0 Comments

Gnawwy 0.0.1 published

Gnawwy, my in-development combination Twitter/email/RSS notifier (only Twitter works so far), is in version 0.0.1, the first version that actually does something! Go ahead and download it here. Oh, and I did change my Twitter password, so don’t bother trying to log in with that password. Also, I recommend against changing the update interval in gnawwy.py to shorter than 60 seconds, as only 100 update checks are allowed per hour.

Next version will include storing usernames/passwords in a configuration file and e-mail checking.

2 Comments