Amateur Topologist

Everything but topology.

A Haskell Newbie’s Guide to Text.JSON

JSON parsing is practically required for any modern language to be able to interface with web-based applications; most of them offer JSON as a reply format, and the alternative (usually XML) can be cumbersome to work with. But Haskell and its strong type system seem like they’d be extremely ill-suited to parsing JSON; an object’s values can be arbitrary JSON objects, including different types in the same object, and arrays can contain different types of objects. How do you deal with this? Well, if you’re going to be parsing JSON, then you have to have some kind of format that you’re expecting; you know that you’ll get, say, an array of objects each of which has a specific key that has an integer as a value, and that integer is all you care about. Fortunately, Text.JSON exists, and once you get your head around how to use it, it’s simple. (Then again, so are many things in Haskell, but that doesn’t help me understand Arrows any better!)

So, let’s look at an example: getting a user’s public timeline from Twitter, and turning it into an array of Status values. We can ignore the process of actually sending Twitter the request and pulling the JSON out of the response, and for the sake of brevity we can ignore a bunch of the irrelevant data that Twitter returns, such as whether the tweet is a reply, the source, etc. Here’s the result of asking for the last 2 tweets from the official Twitter account, with irrelevant detail stripped:

[
    {
    "user": { "screen_name": "twitter" },
    "text": "Read this guest post from @FiresideInt on his experience in Haiti http:\/\/t.co\/mbMU56R"
    },
    {
    "user": { "screen_name": "twitter" },
    "text": "Do you use Twitter for a business, school, community group or another local organization? Follow @TwitterBusiness for tips and useful info!"
}
]

So, what we have here is a list of objects; each object has a user attribute whose screen_name is the actual username; the object’s text attribute then contains the actual text of the tweet. Let’s get to work.

Making the JSON value

It’s helpful to play around with whatever we’re trying to manipulate in ghci; so let’s load up the JSON into a String, json, and try to parse it:

Prelude Text.JSON> json <- readFile "/Users/phurst/json"
Prelude Text.JSON> decode json

<interactive>:1:0:
    Ambiguous type variable `a' in the constraint:
      `JSON a' arising from a use of `decode' at <interactive>:1:0-10
    Probable fix: add a type signature that fixes these type variable(s)

Why doesn’t this work? Well, decode has type (JSON a) => String -> Result a. So you can decode a string of JSON into anything in the JSON typeclass. Well, JSValue is in the JSON typeclass, so let’s try that:

Prelude Text.JSON> decode json :: Result JSValue
Ok (JSArray [JSObject (JSONObject {fromJSObject = [("user",JSObject (JSONObject {fromJSObject = [("screen_name",JSString (JSONString {fromJSString = "twitter"}))]})), (rest of line omitted)

Wordy. But we did get a successful parse (as indicated by the Ok); we then have a JSArray which contains the two JSONObjects. But in order to get at the things inside the JSArray, we’d have to manually remove the constructor via (\(JSArray x) -> x) or something. And then if we didn’t get an array (because we got an error!), we would get an unfriendly “Non-exhaustive patterns in lambda” exception. So, what do we do? Well, we’re trying to get a list of values, so let’s ask decode to give us one:

Prelude Text.JSON> decode json :: Result [JSValue]
Ok [JSObject (JSONObject {fromJSObject = [("user",JSObject (JSONObject {fromJSObject = [("screen_name",JSString (JSONString {fromJSString = "twitter"}))]})), (rest omitted)

Awesome! We have an array of JSObjects. But again, we’re still stuck inside that JSObject ‘wrapper’. This annoyed me for a while, until I realized that JSObject is both a data constructor in the JSValue type, and its own type! So we can ‘ask’ decode to give us a list of JSObjects:

Prelude Text.JSON> let decoded = decode json :: Result [JSObject JSValue]
Ok [JSONObject {fromJSObject = [("user",JSObject (JSONObject {fromJSObject = [("screen_name",JSString (JSONString {fromJSString = "twitter"}))]})), (rest omitted)

Now we’re in business. We have a list of JSObjects, which is as good as we’re going to get. Now we need to actually deal with getting data out of them. It’s a good idea to split the ‘parse an individual item’ logic off into its own function, which I’ll call makeStatus.

Writing makeStatus, dealing with nested objects

Now, we could call fromJSObject :: JSObject e -> [(String, e)], search through the pairs, and then deal with them manually. But that’d be messy, and we wouldn’t get error handling if for some reason Twitter mysteriously didn’t give us a “user” object. Instead, we should take advantage of the function valFromObj :: JSON a => String -> JSObject JSValue -> Result a, and the fact that Result is a monad. Together, these mean that we can write simple code:

data Status = Status { user :: String, text :: String }

makeStatus :: JSObject JSValue -> Result Status
makeStatus tweet = let (!) = flip valFromObj in do
    userObject <- tweet ! "user"
    user <- userObject ! "screen_name"
    text <- tweet ! "text"
    return Status {user = user, text = text}

Here, I’ve defined (!) for brevity’s sake, then used the Monad instance of Result to ‘chain together’ my calls to (!). At the end, I wrap the user and text into a Status value, then ‘return’ it back into the Result monad. Let’s see it:

Prelude Text.JSON> let tweet = (\(Ok x) -> x) decoded !! 0 -- just to get a non-monadic JSObject for now
Prelude Text.JSON> makeStatus tweet
Ok (Status {user = "twitter", text = "Read this guest post from @FiresideInt on his experience in Haiti http://t.co/mbMU56R"})

We’ve successfully parsed a status update into its components! If we cared, we could pull more information out of the original object, such as real name, avatar URLs, time of posting, etc. But first, we have a more interesting problem: how do we join these two together?

Combining decoding and parsing

Look at the type for our decoding function and for our parse function:

Prelude Text.JSON> :t \json -> decode json :: Result [JSObject JSValue]
\json -> decode json :: Result [JSObject JSValue]
  :: String -> Result [JSObject JSValue]
Prelude Text.JSON> :t makeStatus
makeStatus :: JSObject JSValue -> Result Status

Clearly, we’d like it if we could put those in one function with little effort. So, let’s abstract out the specifics: we have functions f :: a -> m [b] and g :: b -> m c. We want h :: a -> m [c]. It’s obvious we want to map g, but a normal map isn’t right, since we’d get a function of type [b] -> [m c]; we want the monad to be outside the list! So we use the monadic map, mapM g :: [b] -> m [c].

Now, we have our two functions, and we could use do-block syntax to combine them:

parseTimeline json = do
    decoded <- decode json :: Result [JSObject JSValue]
    mapM makeStatus decoded

But do-block syntax isn’t terribly ‘Haskell-y’, and there’s only two lines of it. Surely there’s a way to combine them! And if we use Hoogle and search for (a -> m b') -> (b' -> m c') -> (a -> m c') (using primes to represent lists), we find that (>=>) in Control.Monad does exactly what we want! So we can rewrite it as a one-liner:

parseTimeline :: String -> Result [Status]
parseTimeline = decode >=> mapM makeStatus

Two things to notice: first, the explicit type declaration for decode isn’t necessary. In fact, it wasn’t necessary in the above version either. This generally happens once you’ve actually written the processing part of your JSON handler; the type signature of the processor forces the types in the parser! Second, the function is now pointfree; there’s no point in including argument, so we might as well omit it.

Errors

Now, there’s one problem here that I haven’t addressed. While we do get error handling for ‘free’, since monadic handling of Result values will pass through errors, the errors are typically unhelpful; they only show what failed to parse. And if we failed to parse the initial array, for example, that probably means that Twitter gave us an error message instead, and we’d like to know what it says! Finally, our output is stuck in the Result monad until we get it out.

The solution here is to write an error handler combinator; it takes a processor and a JSON string, and tries to process it; if the result pattern-matches against Ok x, then we parsed successfully; if it doesn’t, then we parse it again looking for the error message and handle that according to however your program deals with errors.
For my part, I kind of cheat; I don’t use the JSON in the error combinator, I use the raw HTML response, and if the parse fails, I throw an exception according to to the access code returned. But I eventually do plan to actually grab the error message, which will simplify the control flow of the library and give me more specifics on what went wrong.


This entire post was inspired by my experiences writing askitter, a Haskell Twitter library using OAuth for authentication. Most of the ‘final’ code in here is copied straight from there, aside from various wrapping/unwrapping functions. It’s been very useful for learning lots of things; it’s been suggested that I ‘hide’ the fact that Twitter only gives you chunks of 20 tweets at a time by using enumerators; when/if I get around to that, I’ll write an explanation.

7 ResponsesLeave one →

  1. Anon_The_Third

     /  November 5, 2010

    Through all of this, my only thought was “JSON! JSON! JSON!”. I am terrible at nerdery. http://www.youtube.com/watch?v=0cgOti7gLus&feature=related

    Reply
  2. Anonymous

     /  November 7, 2010

    PROTIP: If you make Status an instance of the JSON type class then you can decode Statuses directly:

    instance JSON Status where
    readJSON tweet = let (!) = flip valFromObj in do
    userObject <- tweet ! "user"
    user <- userObject ! "screen_name"
    text <- tweet ! "text"
    return Status {user = user, text = text}

    — Left undefined because I'm lazy.
    showJSON = undefined

    – Now we can decode our JSON data directly into a list of Statuses.
    let statuses = decode json :: Result [Status]

    Alternatively, using Text.JSON.Generic we can get rid of the boilerplate parsing code in readJSON altogether:

    {-# LANGUAGE DeriveDataTypeable #-}

    import Text.JSON
    import Text.JSON.Generic

    data User = User {
    screen_name :: String
    } deriving (Eq, Show, Data, Typeable)

    data Status = Status {
    user :: User,
    text :: String
    } deriving (Eq, Show, Data, Typeable)

    – Now we can convert our JSON data to our Haskell datatype without having to do any of the parsing ourselves.
    let statuses = decodeJSON json :: [Status]

    Reply
  3. Anonymous

     /  November 7, 2010

    The indentation in my previous comment got mangled. Here’s a more readable version: http://hpaste.org/41263/parsing_json_with_textjson

    Reply
    • Yeah, indentation tends to get mangled in WordPress. If you use the HTML editor and switch to visual, then back, it removes all your line-initial indentation, even inside <code> blocks.

      Anyway, that’s really neat! Goes to show the power of deriving Typeable and generics.

      Reply
      • The deriving via Typeable is indeed neat. I’ve found, though, that in a big application, you generally want to implement your own instances of JSON yourself, because you start making concessions on the representation of the data type to fit how it’s read in JSON. For example, with deriving Typeable,Data JSON instances, you might have some fields that you don’t want to show to the outside world, or that the browser just doesn’t care about — but they have to fill those fields in for it to parse, even if you use “Maybe” (then you have to provide Just foo in the JSON). My colleague warned me about this and I took it lightly and then learned the hard way. It’s also hard, for me, to implement a custom instance of Data and Typeable, whereas JSON is trivial. Anyhoo, just some wise words that I didn’t heed.

        Reply
  4. Oscar

     /  November 8, 2010

    Shameless plug: You could also consider test-json-qq (http://hackage.haskell.org/packages/archive/text-json-qq/0.2.0/doc/html/Text-JSON-QQ.html) if you are not interested in automatic serialization or writing the json data structures by hand (but decodeJSON is probably the neatest alternative).

    Reply
  5. Great topic and great comment by Anonymous (a different one!)

    Reply

Leave a Reply