Writing Heist Splices for Snap
I’ve been doing a lot of web stuff lately; so far, it’s only been very simple HTML + CSS + JS (well, CoffeeScript, but who’s counting), but eventually I might move on to other, neater things. And in the process one of the things that I found is Twitter Bootstrap; it’s a bunch of CSS that makes it ‘easy’ to make professional-looking sites. I put ‘easy’ in quotes because the learning curve is definitely steeper than that of CSS itself, because you have to nest your DOM just so in order for the given CSS to apply right. But you get nice margins, good color themes, etc.
One of the things that you get when you use Bootstrap is a very nice-looking nav bar, which is actually just a very well-themed unordered list of links.
In particular, if you set the class of one of the list items to “active”, it ‘highlights’ it:
Now, I’m serving this site through Snap, the Haskell webdev framework; I’m doing this because one of my subsites needs to be able to upload files, and I figured I might as well use Haskell. And Snap comes with a templating language called Heist. So, given that I’m already using templating to insert the boilerplate jQuery/Bootstrap includes and the GitHub link/authorship at the bottom, wouldn’t it make sense to template the nav links?
It turns out that it’s fairly simple to do, as illustrated in NavLinks.hs. We start with imports; nothing too interesting here. We need Heist to access the templating engine and Text.XmlHtml to actually construct the nodes.
1 2 3 4 5 6 7 8 9 10 11 12 | {-# LANGUAGE OverloadedStrings #-} module NavLinks where import qualified Data.ByteString as B import qualified Data.Text as T import Data.Text.Encoding import Snap import Text.Templating.Heist import qualified Text.XmlHtml as X |
Next we declare the type:
17 | navSplice :: MonadSnap m => [(B.ByteString, T.Text)] -> Splice m |
The way a splice works is this: you construct a Splice in some monad Splice m, and you can lift monad actions in the m monad into Splice m. In this case, we can be in any monad that provides us with getRequest, since we want to get the URI of the request. Since getRequest :: MonadSnap m => Request, we can be in any monad that is an instance of MonadSnap. The argument to navSplice is the list of (path, title) tuples; for example, it might look like [("/about", "About"), ("/blog", "Blog")]. The path is a ByteString because the request gives us the URI as a ByteString, and the title is Text because Text.XmlHtml uses Text. Now, onto the actual declaration:
18 19 20 21 22 23 24 25 26 | navSplice links = (:[]) . X.Element "ul" [("class", "nav")] <$> do currentURI <- lift $ rqURI <$> getRequest -- add the 'active' class if the href is a prefix of the current URI let li path | path `B.isPrefixOf` (currentURI `B.append` "/") = X.Element "li" [("class", "active")] | otherwise = X.Element "li" [] return $ map (\(path, title) -> li path [buildLink path title]) links -- build a link to the path with the given text where buildLink path title = X.Element "a" [("href", decodeUtf8 path)] [X.TextNode title] |
Look at the do-block first. Line 19 just gets the current URI out. Lines 21-23 are interesting; what we’re doing here is declaring a function of type li :: B.ByteString -> [X.Node] -> X.Node. The way an HTML element is constructed in Text.XmlHtml is through the Element constructor, which has type T.Text -> [(T.Text, T.Text)] -> [X.Node] -> X.Node. The first Text is the tag; the array of Text tuples is the list of attributes, and the list of Nodes is the list of children. So, here we’re partially applying the tag (always “li”) and the attributes (which is either empty or class="active"). We apply the “active” class only if the path is a prefix of the current URI; we append a “/” to the end of the current URI to make sure that trailing slashes are ignored.
Next, look at line 26; buildLink just builds a link node looking like <a href="path">title</a>. X.TextNode is, obviously, the constructor for text Nodes.
On the last line of the do block, we map over the (path, title) tuples, build up a list of <li> elements, and return that list. Finally, on line 18 we construct an unordered list element with the “nav” class. Finally, we construct a single-element list using (:[]) (which some people call the monkey function), which just constructs a list with its argument inside. My actual code uses return instead, but I decided that this would be clearer.
Adding that splice to the default list of splices is as simple as adding
pages = [ ("/chargen/", "Character generator")
, ("/logify/", "Chatlog formatter")
, ("/scalemate/", "Scalemate generator")
]
to the toplevel of Site.hs and adding
to the app initializer. Now <nav/> can be used in the template to automatically insert the nav link list and highlight the current page!
One small annoyance of this is that the generated HTML is all in one line, although since it’s not handwritten you probably won’t care so much, especially since Firebug and Chrome’s debugging tools will automatically do the indentation for you and let you collapse/expand child nodes at will.
Name Your Type Variables!
Haskell’s type system is commonly touted as one of the most powerful features of the language; it’s often said that if a program compiles, it’s pretty likely to be correct. And while there are always going to be errors your type system can’t catch (logic errors, off-by-one-errors, etc.), I’ve found that the type system helps ensure that your programs are put together correctly. If it typechecks, you’re probably not making a higher-level logic error; you might be trying to fit a red peg into a blue hole, but you’re not trying to fit a square peg into a round hole, so to speak.
The problem is that figuring out what the type variables in a polymorphic type (i.e., type like length :: [a] -> Int) mean. Sure, in simple cases it’s simple; you look at a type signature like ifM :: Monad m => m Bool -> m a -> m a -> m a and you know that m is going to be some monad and a is some arbitrary action. But take a look at validate :: Monad m => Form m i e v a -> Validator m e a -> Form m i e v a, from digestive-functors. What do m, i, e, v, and a stand for? It’s pretty clear that m is some monad, but what about the others? It turns out that:
irepresents the input to the form, or its environment (i.e., the submitted parameters)erepresents the the error type that can be producedvrepresents the ‘view’ of the form (how it’s represented to the user)arepresents the actual evaluation of a form (for example, in a registration form, it might bedata Registration = Registration Text ByteString)
So my argument is that it might be better to describe it as Form m input err view result -> Validator m err result -> Form m input err view result; this way, it’s clearer exactly what each type variable corresponds to.
Again, I’m not saying that single-letter type variables should be completely banned; something like (>>=) :: (Monad m) => m a -> (a -> m b) -> m b is simple enough. The conventions that a and b stand for the to type variables in a generic function and that m means ‘some arbitrary monad’ are strong enough that it’s like using i in a for loop in Python or whatever.
(Full disclosure: yes, lots of my own Haskell code uses single-letter type variables in places where I shouldn’t. I do intend to fix this).
But if you’re trying to learn some new framework or module, imagine how much nicer it would be if the types that the Haddock gives you actually told you something about what it’s doing (especially given that most Haskell modules seem to be lightly documented outside of their Haddock documentation, lacking example use cases). Programmers outgrew using single-letter variable names decades ago; we shouldn’t make the same mistake in the type system. This is part of the reason why I’m making such slow progress learning the Snaplet framework in the dev version of snap; it’s hard for me to get an intuition as to what the types are when I keep seeing Handler b v a (b is the base app/snaplet, v is what’s being ‘focused on’, and a is what the Handler ‘evaluates to’; Handler b v has a Monad (and Functor, and MonadPlus, etc.) instance).
By the way, the Snaplet framework, though slightly complicated, does wind up making a good deal of sense once you get used to it; I plan on writing up a post about what the commonly-used types/type constructors 'mean', as well as a post about how to make a very simple login Snap app. And I think, ultimately, documentation is one of the things Haskell sorely needs; not just more documentation on what individual functions do, but how those functions combine into a larger whole. Imagine trying to learn Django just by looking at the module documentation without having the benefit of the tutorial! But I think that's a subject for another post; I'm trying to get this blog active again, after all.
Settings in Chrome Extension Content Scripts
Google Chrome, like most modern browsers, allows you to write extensions for it. And those extensions can inject JavaScript into pages that the end user views, so that you can write extensions to do things such as customize the interface of some site, theme it, etc. Simple extensions that don’t need to be configured by the user can get away with being simple Greasemonkey-like scripts that the user installs by navigating to and then forgets. The upside of this is that it’ll also work in Firefox, since Chrome’s native content script support is a strict subset of Greasemonkey’s (it doesn’t have @require or the ability to set/get data, among a few other things). The downside is that there’s no way to do any kind of updating short of your users manually installing a new version, and, more importantly, your extension cannot have any state or user configuration. In order to have stateful extensions, you have to go full-blown Chrome.
The basics of Chrome extension development are adequately covered by Google’s own Getting Started tutorial. The interesting part comes when you want your content scripts to be configurable by the user. For example, Tumblr Hate (which I’m the author of, and which has source that you can view here) adds ‘hate’ links to tumblr posts that allows the user to hide them. The user can configure the text of the ‘hate’ links, as well as what shows up when a post is ‘hated’ (i.e., hidden). Fortunately, Chrome has built-in support for Extension options pages, which have a localStorage object that you can use as a key/value store (although values are serialized to strings). Unfortunately, the localStorage object is not shared between your options page and your content scripts. I assume the reason for this is because of sandboxing, but it creates a problem: how do your extension’s content scripts get the settings?
Use chrome.extensions.sendRequest
You have your options page do whatever it does, saving settings to localStorage. You then create a ‘background’ page that installs a listener using chrome.extension.onEvent.addListener; the exact form of a request and how it should be responded to is obviously up to you, but here’s what I use: A request looks something like {type: "get", name: ["settingName", "otherSettingName"]}, which is responded to with an object that looks something like {settingName: 1, otherSettingName: true}, or {type: "set", name: "settingName", value: "5"}, which sets the setting settingName to 5. You can even special-case get requests so that if the name is a single string, the response is just the value and not an object. If your extension does things other than inject content scripts, you’ll need other request types to do things like open tabs or whatever, but this is good enough for simply mimicing a key/value store. The code to do this is simple:
switch (request.type) {
case "get":
var data = {};
if (request.name.constructor == Array) {
for (var i = 0; i < request.name.length; i++) {
data[request.name[i]] = localStorage[request.name[i]];
sendResponse(data);
} else {
sendResponse(localStorage[request.name]);
break;
case "set":
localStorage[request.name] = request.value
break;
default:
console.log("invalid request " + request);
break;
}
}
The next issue, however, is trickier: chrome.extension.sendRequest is asynchronous. In particular, instead of returning the response, it accepts a callback that takes the response as an argument, and returns nothing useful. So what do you do?
Do your work in the callback
If you just use the callback to set some globals and then continue on, you have a race condition. Consider the following code:
chrome.extension.sendRequest({type: "get", name: "settingName"}, function (resp) {
setting = resp;
}
);
console.log(setting);
This will fail, because sendRequest is asynchronous, and sending a request and receiving the response very well might happen after the setting has been used! The correct thing to do here is this:
chrome.extension.sendRequest({type: "get", name: "settingName"}, function (resp) {
setting = resp;
doStuff();
}
);
function doStuff() {
console.log(setting);
}
Here, you can guarantee that doStuff won’t be called unless the setting has been properly set, so you can safely rely on its value.
Note that I didn’t pass setting as an argument to doStuff; I very well could have. But in the case of Tumblr Hate, for instance, I didn’t feel like threading the relevant variables through function calls, so I simply declared the variables at the top so they’d be global. (Well, no I didn’t, I set the response’s variables as properties of a global ‘root’ object, but that’s basically the same thing.)
Perl will never go away, ever
Perl was one of the first languages that I ever learned and actually truly did things with; it was the first language I ever wrote a nontrivial program in (a DES implementation that I have unfortunately lost the source code to, or else I would post it). The first language I ever wrote a program in was something I don’t even remember in BASIC; I seem to have blocked all memory of it from my memory, probably for the better. So I have a bit of a soft spot for the language, and so I still have some of my bad habits; since I didn’t use strict or -w, my code would likely be full of uninitialized variables and barewords. It’s a bad habit, and to this day I still have to be reminded occasionally that other languages, such as Python, do require variables to be declared.
But Perl is old now, and I’ve mostly moved on to other languages, like Python. I like the object-orientation, the support for functional paradigms and other nice things like list comprehensions and lambda functions. I like not having to sigil all of my variables with $ or @ or %, I like being able to supply keyword arguments to my functions so that I don’t have to remember which weird order I decided to use, I like the sheer amount of fun things that you can do with object orientation combined with reflection, metaprogramming, and everything being a first-class object. And yet, I still think it’ll stick around for a while.
Why do I say that? Simple. I was talking with someone who had left in the middle of an online IRC-based role-playing game, and they had asked for chatlogs of what had happened after they left. I had them, since I run weechat in tmux (like irssi in screen, but better!) and so am in every IRC channel I’m in 24/7. But the question was: how could I pull out just the lines that were said when he left? And the answer was Perl. It turns out that the .. operator, which in a for loop or other situations where a list is expected produces a range (so (1..9) as a list produces the list (1,2,3,4,5,6,7,8,9)), does something completely different in a scalar context, like in the conditional of an if statement. Take the statement print if (/Person.*has quit/ .. /Person.*has joined/). Each time this statement is run, the conditional will evaluate to false, until the left-hand side evaluates to true. Then it’ll start evaluating to true, until the right-hand side evaluates to false, and then it’ll stop being true (but it’ll still be true until it’s evaluated again!), etc., etc. So if this is in an implicit while loop running through the lines of a file, it’ll start printing when it sees a line saying Person has quit, including that line, then stop when they rejoin, but still print that line, and then it’ll keep going until it sees another quit line, etc. And the best part is, if you call perl with -n, you automatically get a while loop that assigns the current line of the file it’s reading from to $_, the implicit variable in the matching and print.
If I wanted to do that in something like Python, I’d have to manually set up the read loop, write a function to trawl through, build up regexp objects to match on, etc. And that’s fine for a piece of code I intend to maintain. But for a quick one-line script like this? Too much effort. All I need is perl -ne.

Security vulnerability in Haskell with CGI
Compiled Haskell programs all include special RTS (Run Time System) options, that change things like the number of cores that it runs on, various internal things relating to how often garbage collection runs, etc. They’re specified by invoking the program like ./foo +RTS -m10 -k2000 -RTS to run the GHC-compiled program ‘foo’, reserving 10% of the heap for allocation and setting each thread’s stack size to a maximum of 2000 bytes. In the current build of GHC, there is no way to disable these options from working (although the option –RTS will make all further options be interpreted as normal, non-RTS options). The problem is that the option -tout will write profiling data to the file out. So, if your program is setuid root, anybody who runs it can write the profiling data to, say /etc/passwd and render the system unusable. They don’t get to pick what gets written, so they can’t add a backdoor for themselves, but they can essentially scribble over whatever files they want. This is bug #3910, and the fix (disabling RTS by default) has been uploaded.
Now, one of the more little-known features of CGI is that if you pass a query string that does not contain any = signs to a CGI script, the httpd may pass the string along as command-line arguments. This is specified in section 4.4 of RFC 3875, and it specifies how the query string SHOULD be turned into arguments (although it does not say anything about whether the httpd should behave this way, only that some do). This is an example script that only outputs its arguments in a comma-separated list; the link gives it some sample arguments. Note that by URL-escaping, you can send arbitrary strings through… including +RTS. So if that were, say, a Haskell script, I could pass the query string ?%2BRTS+-tindex.html+-RTS and overwrite index.html.
There are three ways to get around this: first, GHC 6.12.2 has the -no-rtsopts option, which will obviously disable RTS options. So if you just recompile your script with that, it’ll be safe. Note that 6.14 will disable the RTS options by default; the 6.12.2 patch didn’t for backwards-compatibility reasons. Second, if you don’t want to use 6.12.2 for whatever reason, you can wrap it in a shell script that calls it with no options. For example, replace the Haskell script with a shell script called, say, hscript.cgi (if your Haskell program is called hscript) that calls it with no arguments, e.g.
./hscript.real
and rename the Haskell script to hscript.real, so that it doesn’t get run as CGI (I’m assuming that .real files don’t get run as CGI on your machine!) Another thing you can do is to add the following to your .htaccess, which will give 403 Forbidden errors to anybody passing RTS arguments in the URL:
RewriteCond %{QUERY_STRING} ^(?:[^=]*\+)?(?:%2[bB]|(?:-|%2[dD]){1,2})(?:%52|R)(?:%54|T)(?:%53|S)(?:\+[^=]*)?$
RewriteRule ^ - [F]
This will solve it for every Haskell script you use, but relies on the regex being correct, which isn’t something I can guarantee.

dissociated-blogosphere: never have to write an original post again!
For the past two weeks or so, I’ve been working off and on on a project called dissociated blogosphere (OSX and Linux binaries here). It takes a bunch of URLs, looks through them for an RSS for the raw content of the posts, and then stores the words of the posts in an array. It then picks N random, consecutive words (where in this case N is 2), and starts generating new text, by picking a new word x% of the time if x% of the time, the previous N words were followed by that word. For example, if 90% of the time, the words ‘the quick’ were followed by ‘brown’, and the other 10% of the time, they were followed by ‘red’, then when the two-word phrase ‘the quick’ was randomly generated, it would pick ‘brown’ 9 times out of 10, and ‘red’ 1 time out of 10. This is the algorithm Emacs‘s dissociated press feature uses, hence the name. Running it a few times on this site and picking some of my favorite sentences gives:
Second, I ignored the axes of the work you envision. So start small, and think about the free group on two generators, which is obviously highly undesirable behavior. However, it does have the web interface, I’ll have it up by last week, but that obviously didnt happen. Taking into account the fact that I’m using. The central thing that makes MS Paint Adventures unique to the point where it’s my go-to language for random programs (I still use Python for that), but if we pick two of them and rotate one of the set of all rotations that you have some custom function you want soup or salad, both is not a valid answer.
It’s my first medium-scale project written in Haskell (even though there isn’t a lot of code, what little was there was not trivial to write), and I’ve learned several lessons from it:
- The Haskell wiki is an excellent resource. When I was trying to learn how to use HXT, the Haskell XML Toolbox, I found the provided documentation somewhat inadequate. But the HXT article on the Haskell wiki is an excellent introduction to the filter abstraction, which is all that I need for the basic stuff that I’m using.
- Read the Haddock documentation. The HXT article, as useful as it was, didn’t cover a couple essential things I needed to know (such as how to pull all elements with type “application/rss+xml”). So I look at the documentation for Text.XML.HXT.Arrow.XmlArrow (the module containing the arrows that HXT uses to filter XML), and saw that
hasAttrValue :: String -> (String -> Bool) -> a XmlTree XmlTreelooks about right; from the type, I can guess (correctly) that I need to pass it the attribute and a prediate on the value of the attribute (i.e.,hasAttrValue "href" (== "application/rss+xml")). - One goal at a time. This isn’t specific to Haskell. When I started on this, I meant for it to require you to provide the RSS feed. Then, I realized that having a larger corpus might be better, so I added the ability to pull from multiple feeds. Then I decided that expecting people to find the RSS feed by hand might be a bit much, so I rewrote it to pull the RSS feed from the site. And I eventually plan to write a CGI frontend so that you can just run it online. If I had decided from the start to do all these things, I probably never would’ve gotten started. As Linus Torvalds said:
Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large. If you do, you’ll just overdesign and generally think it is more important than it likely is at that stage. Or worse, you might be scared away by the sheer size of the work you envision. So start small, and think about the details. Don’t think about some big picture and fancy design. If it doesn’t solve some fairly immediate need, it’s almost certainly over-designed.
- Strip and gzip your executables if you’re going to distribute them. Due to the fact that I’m statically linking in HXT, which is a sizeable library, the compiled, non-stripped version of dissociated-blogosphere is a whopping 12 megabytes. This isn’t due to inefficiencies in my own code, but due to the sheer size of the HXT library. Running the Unix command line utility strip (which only removes internal debugging information) cuts it down to about 5 MB, and then gzipping the binaries takes it down to a little over a megabyte.
- Split things into libraries where it’s appropriate. Part of the problem with using HXT is that it makes recompilation slow; if I could do it all over again, I might have used HaXml, but HXT has the advantage of having nontrivial amounts of documentation written about it (on the Haskell wiki). If I had instead split the RSS parsing code into its own library, I could have only recompiled those parts whenever I touched them, which wasn’t nearly as often as I touched the code frontend. Plus, it’s just good programming practice.
So what do I have planned for dissociated-blogosphere? First off, I plan to make it faster by caching RSS lookups; by storing a map from page URLs to RSS feed, I can cut the number of network requests in half. Second, I plan to implement actual error handling; right now if you give it a bad URL it fails and doesn’t produce any useful output, regardless of whether other URLs are good. Third, I’m going to split out the RSS part into its own library, which I might make its own package on hackage. Fourth, I intend to eventually write a web interface (either in Haskell or in Python) so that you don’t have to download and install it. I originally intended to have the web interface up by last week, but that obviously didn’t happen. Taking into account the fact that it’ll take longer than I think it does, I’m guessing I’ll have it up by two weeks from now (so, a month). And finally, when/if I do the web interface, I’ll have it color the text according to which blog it’s from, or maybe even output xterm color codes if I don’t write the web interface.

Why I Love Currying
So I’ve been playing around with Haskell a lot lately and using it for various random stuff; I haven’t progressed to the point where it’s my go-to language for random programs (I still use Python for that), but I at least have an idea of how to use it. And there’s one feature of Haskell that I miss sorely when I write code in Python, or pretty much any other vaguely functional language: currying.
In Haskell, every function takes a single argument. A function of multiple arguments, such as map, which applies a function to every element in a list, actually only has one argument; for example, map can be interpreted either as taking a function and a list and returning a list, or as taking a function and returning a function that takes a list and returns a list. More formally, in Haskell, these two type declarations are equivalent:
This process, of taking a multi-argument function and converting it into a series of single-argument functions is known as currying, after the mathematician Haskell Curry (who, obviously, is also the source of the name Haskell); the process of partially applying arguments to a function in this way is known as ‘partial application’, but is also called currying. One of the most obvious examples of currying is in sections: the function (0 ==) is syntactic sugar for (==) 0, and returns whether its argument is equal to zero. Furthermore, we can also partially apply the predicate to filter, to make a function that filters its argument on a fixed predicate. So, these three examples are completely equivalent:
(where /= is Haskell’s not-equal operator). The first is the most explicitly-written version, using no currying at all. The second curries the predicate; (/= 0) x is the same as x /= 0. Finally, since removeZeroes applied to an argument is the same as applying filter (/= 0) to it, we might as well define the former as the latter. Or, to take another example, look at the sortBy: it has type (a -> a -> Ordering) -> [a] -> [a], where Ordering is a datatype that can either be EQ, LT or GT for equal, less than, or greater than. So if you have some custom function you want to sort a list on, you can just say mySort = sortBy f and it will be the same as writing mySort xs = sortBy f xs, only cleaner and neater. Or in my Data.Nimber module (specifically lines 38, 39, and 43), many operations on Nimbers that’re required in order for me to call then ‘numbers’ are just the identity operation. So instead of saying abs x = x, I can just say abs = id.
Furthermore, without currying, you couldn’t have variadic functions; in order to work inside Haskell’s type system, the two types a -> b -> c and a -> (b -> c) have to be the same type. The full explanation involves typeclasses, and is (in my opinion) worth a read, because it’s a good explanation of a pretty horriblexcellent (it’s both at once, you see) type system hack.
As an aside, this also means that id :: a -> a, the identity function, is in a sense the same thing as ($) :: (a -> b) -> a -> b, which is function application. You can see this by substituting (b -> c) for a in the type of id, then removing parentheses:
So, in particular, f `id` x is the same as f $ x, which is just f x. Another way to think of this is that f `id` x = id f x = (id f) x = f x.
Variadic Functions in Haskell
Most modern languages have some kind of printf analogue: a function that takes a format string, and a series of things to be inserted into that string, and formats them all accordingly. At first glance, Haskell’s strong type system would seem to preclude this. There’s no built-in system for writing functions that take variable numbers of arguments, and it seems like it would be difficult to write one. The standard approach is to take a list instead, but this fundamentally doesn’t work for printf, since you’re going to be wanting to print Integers, Strings, and Floats. It’s possible to just pre-apply show to everything, but that’s not really a good idea, because you might want to show them in a different way than the built-in show does. You can use an extension called existential types to create a list of PrintfWrappers which wrap integers/floats/strings (more on that below), but that requires your users to manually do the wrapping, which is, once again, not a good idea. Haskell’s Text.Printf module takes a third approach. Look at the following lines:
Here’s how to interpret this: PrintfType is the type of things that can be printed to. Printing to a String just gives you a string, much like sprintf in C or Perl, printing to an IO () will actually print it out (so you can use it like a normal printf in do blocks, a behavior which I personally find distasteful.). However, printf will return undefined when asked to return an IO r; the reason that you can nevertheless return one is that only declaring IO () as an instance of PrintfType is invalid according to Haskell 98.
PrintfArg, by comparison, are the elements that are valid arguments to printf; they basically consist of the various WordN/IntN types, Integer, Float, Char, and (IsChar c) => [c]. The point of the last instance is that, while you can’t have a specific version of a polymorphic type be an instance of a typeclass, you can restrict it to types whose parameters are themselves instance of another typeclass; the only instance of isChar is Char.
So now that we have that clarified, let’s suppose we want to call printf with “%s %d %f” “foo” 42 3.1, passing it the format string, String, an Integer, and a Float. This causes printf’s type to become
Does this match the pattern (PrintfType r) => String -> r? Let’s go in reverse. String is an instance of PrintfType, and Float is an instance of PrintfArg, so Float -> String is an instance of PrintfType. Therefore, Integer -> (Float -> String) is an instance of PrintfArg, and so is String -> (Integer -> (Float -> String)). Dropping parentheses, this becomes String -> Integer -> Float -> String. So the types all check out. If you pass an invalid type, then you’ll run into something that isn’t an instance of PrintfArg and so the types won’t check.
I mentioned above that if you use something called ‘existential types’, you can do something similar. The way it works is that you define a new type whose data constructor only requires that its argument be of a given typeclass. Look at the following example
When you run showBoxes boxes, you get 2 "f" 83, exactly as you’d expect. Note that, however, the function unbox (Box x) = x cannot be written; it would have to be of type (Show s) => Box -> s, and there’s just no real way to do that. So once you’ve wrapped something up in a Box, you can only get at it by showing it. From this, you can see how to pass a heterogeneous list to printf. The reason that this approach is suboptimal is that it would require Text.Printf to export a Printf data constructor which would wrap up everything to make it of the appropriate type, and that would be rather annoying, especially since it relies on show preserving enough information for you to format the number after reading it back in.
This pattern can obviously extended to any other variadic, heterogeneous function, as long as you can define a suitable typeclass that its arguments must all be instances of. And that’s not really a restriction at all; if you can’t specify a behavior that the instances must have, then you don’t really know what you can do with the arguments, and so you can’t do anything at all!
Data.MemoCombinators and You
Part of the beauty of Haskell is that it allows you to simply write recursive functions. But part of the problem with recursive functions is that they tend to have absolutely horrible big-O run times. The usual solution to this problem is to use what’s known as memoization, which is memorization without the ‘r’, since programmers have to have special names for everything. Memoization is usually implemented as an associative array (or a plain array in the common case where the function takes a single non-negative integer as an argument); the function attempts to look up the return value for its arguments in an associative array. If it finds it, it can return without doing expensive computation; if it doesn’t, then it performs the computation, stores the result in its array, and then returns. In Python, a memoized Fibonacci function might be written as follows:
def fib(n):
if (n < 2): return n
if n not in fib_cache: fib_cache[n] = fib(n-1) + fib(n-2)
return fib_cache[n]
The speed savings gained by this are enormous; on my test machine, fib(35) takes 15 seconds to compute without memoization, whereas fib(1000) computes almost instantly with memoization. In terms of big-O running times, I believe that the memoized version takes time, whereas the unmemoized version takes
, which is interesting since the Fibonacci numbers themselves are
. In any case, the memoized version is clearly superior.
But how do you do this in a language such as Haskell? You can’t carry state between the various incarnations of the function, since that could potentially lead to the function’s values not solely depending on its arguments, violating referential transparency. You can’t carry the state around in a monad because then different calls to the function would each have separate caches, so you’d have to pull some kind of trick where the function returns itself and its value, then pass the function around and it would just be a huge mess. So instead what you do is you use Data.MemoCombinators, which is a package that lets you turn functions into other, memoized functions. So how do you use it? It’s not too hard, especially if you’re memoizing functions that only use builtin types. An example, straight from the Data.MemoCombinators page:
where
fib' 0 = 0
fib' 1 = 1
fib' x = fib (x-1) + fib (x-2)
There are two things to note here: first, the memoized version, fib, is generated from the non-memoized by calling Memo.integral on it. This is how you create memoized versions of single-variable functions: you apply the appropriate combinator. Second, fib’ calls fib inside it. This is very important: if fib’ called fib’, then you couldn’t save time within the fib function, only outside of it. With fib’ calling fib, on the other hand, then the first time you call fib 1000, not only will it return before the heat death of the universe, but you’ll also get fib 999, fib 998, etc. cached.
But what if your function to be memoized isn’t one of the standard types? That’s why there’s Memo.wrap. You just have to define two mappings: one from your type to some combination of MemoCombinator types, and one that goes from that combination back. An example will make it clear:
So as you can see, first you build up a memoString type which can memoize Strings; since a String is just a list of Chars, you can just apply Memo.list to Memo.char. Then you define toFoo and fromFoo, which send you from the abstracted Foo type to a tuple of a String and an Int. Finally, you use Memo.wrap to ‘wrap’ the pair of a memoized String and a memoized Int (constructed using Memo.pair, naturally) up in an abstract memoFoo memoizer.
The other thing you can do with MemoCombinators is memoize functions of multiple variables. Take this sample of code from a project I’m working on:
(*) = Memo.memo2 memoNimber memoNimber (*!) where
x *! (Nimber 1) = x
(Nimber 1) *! x = x
a *! b = mex $ liftM2 combine [0 .. pred a] [0 .. pred b]
where
mex xs = fromJust $ find (`notElem` xs) [0..]
combine a' b' = a * b' + a' * b + a' * b'
The actual definition of *! isn’t important, I’m only including it for completeness. Nor are the definitions of toNimber and fromNimber. What is important is Memo.memo2: you use it to generate a memoized function of multiple arguments. You just pass it memoizers for each of its arguments (since * takes two Nimbers, I pass it memoNimber twice) and the unmemoized version, and it gives you a memoized version.
As for how Data.MemoCombinators works, I can’t really explain that. I know it has to do with the fact that expressions in function definitions are cached, but beyond that my knowledge fails. Maybe if I ever learn it I’ll return to this and explain it.
Edit: After I wrote this I realized that Data.MemoTrie exists; while it has cleaner syntax for memoizing functions (the memoizer doesn’t need to know the types of the arguments), it has a disadvantage in that it’s not immediately obvious how to memoize the types it doesn’t give you. But if you’re just memoizing functions of Ints or something, go ahead and use MemoTrie.
Gnawwy 0.0.2 out!
And you said I couldn’t do it. Actually, so did I. I managed to motivate myself enough to get off of my ass and release version 0.0.2 of gnawwy, my little program I wrote to notify Linux users (specifically, Ubuntu users) of updates to Twitter and such via notify-osd. The big changes in v0.0.2 are e-mail account support, support for multiple accounts (Twitter and e-mail), and the movement of password information and such into a .gnawwyrc. The next version I plan on adding some actual error checking and debugging information, so I plan on calling it the actual v0.1.0 release. Now that I’ve motivated myself to get to work on this, I should actually have that out within a week or so. Still yet to come at some unspecified time in the future in no particular priority:
- Notification methods other than libnotify (perhaps even Windows support!)
- RSS feeds
- GUI configuration
- A tray icon to notify you of unread messages
- Actual comments in the code
- A shiny icon
