JSON parsing is practically required for any modern language to be able to interface with web-based applications; most of them offer JSON as a reply format, and the alternative (usually XML) can be cumbersome to work with. But Haskell and its strong type system seem like they’d be extremely ill-suited to parsing JSON; an object’s values can be arbitrary JSON objects, including different types in the same object, and arrays can contain different types of objects. How do you deal with this? Well, if you’re going to be parsing JSON, then you have to have some kind of format that you’re expecting; you know that you’ll get, say, an array of objects each of which has a specific key that has an integer as a value, and that integer is all you care about. Fortunately, Text.JSON exists, and once you get your head around how to use it, it’s simple. (Then again, so are many things in Haskell, but that doesn’t help me understand Arrows any better!)
So, let’s look at an example: getting a user’s public timeline from Twitter, and turning it into an array of Status values. We can ignore the process of actually sending Twitter the request and pulling the JSON out of the response, and for the sake of brevity we can ignore a bunch of the irrelevant data that Twitter returns, such as whether the tweet is a reply, the source, etc. Here’s the result of asking for the last 2 tweets from the official Twitter account, with irrelevant detail stripped:
[
{
"user": { "screen_name": "twitter" },
"text": "Read this guest post from @FiresideInt on his experience in Haiti http:\/\/t.co\/mbMU56R"
},
{
"user": { "screen_name": "twitter" },
"text": "Do you use Twitter for a business, school, community group or another local organization? Follow @TwitterBusiness for tips and useful info!"
}
]
So, what we have here is a list of objects; each object has a user attribute whose screen_name is the actual username; the object’s text attribute then contains the actual text of the tweet. Let’s get to work.
Making the JSON value
It’s helpful to play around with whatever we’re trying to manipulate in ghci; so let’s load up the JSON into a String, json, and try to parse it:
Prelude Text
.JSON
> json
<- readFile "/Users/phurst/json"
Prelude Text
.JSON
> decode json
<interactive
>:
1:
0:
Ambiguous
type variable `a'
in the constraint:
`JSON a' arising from a use
of `decode' at
<interactive
>:
1:
0-10
Probable fix: add a
type signature that fixes these
type variable
(s
)
Why doesn’t this work? Well, decode has type (JSON a) => String -> Result a. So you can decode a string of JSON into anything in the JSON typeclass. Well, JSValue is in the JSON typeclass, so let’s try that:
Prelude Text
.JSON
> decode json
:: Result JSValue
Ok
(JSArray
[JSObject
(JSONObject
{fromJSObject
= [("user",JSObject
(JSONObject
{fromJSObject
= [("screen_name",JSString
(JSONString
{fromJSString
= "twitter"}))]})), (rest
of line omitted
)
Wordy. But we did get a successful parse (as indicated by the Ok); we then have a JSArray which contains the two JSONObjects. But in order to get at the things inside the JSArray, we’d have to manually remove the constructor via (\(JSArray x) -> x) or something. And then if we didn’t get an array (because we got an error!), we would get an unfriendly “Non-exhaustive patterns in lambda” exception. So, what do we do? Well, we’re trying to get a list of values, so let’s ask decode to give us one:
Prelude Text
.JSON
> decode json
:: Result
[JSValue
]
Ok
[JSObject
(JSONObject
{fromJSObject
= [("user",JSObject
(JSONObject
{fromJSObject
= [("screen_name",JSString
(JSONString
{fromJSString
= "twitter"}))]})), (rest omitted
)
Awesome! We have an array of JSObjects. But again, we’re still stuck inside that JSObject ‘wrapper’. This annoyed me for a while, until I realized that JSObject is both a data constructor in the JSValue type, and its own type! So we can ‘ask’ decode to give us a list of JSObjects:
Prelude Text
.JSON
> let decoded
= decode json
:: Result
[JSObject JSValue
]
Ok
[JSONObject
{fromJSObject
= [("user",JSObject
(JSONObject
{fromJSObject
= [("screen_name",JSString
(JSONString
{fromJSString
= "twitter"}))]})), (rest omitted
)
Now we’re in business. We have a list of JSObjects, which is as good as we’re going to get. Now we need to actually deal with getting data out of them. It’s a good idea to split the ‘parse an individual item’ logic off into its own function, which I’ll call makeStatus.
Writing makeStatus, dealing with nested objects
Now, we could call fromJSObject :: JSObject e -> [(String, e)], search through the pairs, and then deal with them manually. But that’d be messy, and we wouldn’t get error handling if for some reason Twitter mysteriously didn’t give us a “user” object. Instead, we should take advantage of the function valFromObj :: JSON a => String -> JSObject JSValue -> Result a, and the fact that Result is a monad. Together, these mean that we can write simple code:
data Status
= Status
{ user
:: String, text
:: String }
makeStatus
:: JSObject JSValue
-> Result Status
makeStatus tweet
= let (!) = flip valFromObj
in do
userObject
<- tweet
! "user"
user
<- userObject
! "screen_name"
text
<- tweet
! "text"
return Status
{user
= user
, text
= text
}
Here, I’ve defined (!) for brevity’s sake, then used the Monad instance of Result to ‘chain together’ my calls to (!). At the end, I wrap the user and text into a Status value, then ‘return’ it back into the Result monad. Let’s see it:
Prelude Text
.JSON
> let tweet
= (\
(Ok x
) -> x
) decoded
!! 0 -- just to get a non-monadic JSObject for now
Prelude Text
.JSON
> makeStatus tweet
Ok
(Status
{user
= "twitter", text
= "Read this guest post from @FiresideInt on his experience in Haiti http://t.co/mbMU56R"})
We’ve successfully parsed a status update into its components! If we cared, we could pull more information out of the original object, such as real name, avatar URLs, time of posting, etc. But first, we have a more interesting problem: how do we join these two together?
Combining decoding and parsing
Look at the type for our decoding function and for our parse function:
Prelude Text
.JSON
> :t \json
-> decode json
:: Result
[JSObject JSValue
]
\json
-> decode json
:: Result
[JSObject JSValue
]
:: String -> Result
[JSObject JSValue
]
Prelude Text
.JSON
> :t makeStatus
makeStatus
:: JSObject JSValue
-> Result Status
Clearly, we’d like it if we could put those in one function with little effort. So, let’s abstract out the specifics: we have functions f :: a -> m [b] and g :: b -> m c. We want h :: a -> m [c]. It’s obvious we want to map g, but a normal map isn’t right, since we’d get a function of type [b] -> [m c]; we want the monad to be outside the list! So we use the monadic map, mapM g :: [b] -> m [c].
Now, we have our two functions, and we could use do-block syntax to combine them:
parseTimeline json
= do
decoded
<- decode json
:: Result
[JSObject JSValue
]
mapM makeStatus decoded
But do-block syntax isn’t terribly ‘Haskell-y’, and there’s only two lines of it. Surely there’s a way to combine them! And if we use Hoogle and search for (a -> m b') -> (b' -> m c') -> (a -> m c') (using primes to represent lists), we find that (>=>) in Control.Monad does exactly what we want! So we can rewrite it as a one-liner:
parseTimeline
:: String -> Result
[Status
]
parseTimeline
= decode
>=> mapM makeStatus
Two things to notice: first, the explicit type declaration for decode isn’t necessary. In fact, it wasn’t necessary in the above version either. This generally happens once you’ve actually written the processing part of your JSON handler; the type signature of the processor forces the types in the parser! Second, the function is now pointfree; there’s no point in including argument, so we might as well omit it.
Errors
Now, there’s one problem here that I haven’t addressed. While we do get error handling for ‘free’, since monadic handling of Result values will pass through errors, the errors are typically unhelpful; they only show what failed to parse. And if we failed to parse the initial array, for example, that probably means that Twitter gave us an error message instead, and we’d like to know what it says! Finally, our output is stuck in the Result monad until we get it out.
The solution here is to write an error handler combinator; it takes a processor and a JSON string, and tries to process it; if the result pattern-matches against Ok x, then we parsed successfully; if it doesn’t, then we parse it again looking for the error message and handle that according to however your program deals with errors.
For my part, I kind of cheat; I don’t use the JSON in the error combinator, I use the raw HTML response, and if the parse fails, I throw an exception according to to the access code returned. But I eventually do plan to actually grab the error message, which will simplify the control flow of the library and give me more specifics on what went wrong.
This entire post was inspired by my experiences writing
askitter, a Haskell Twitter library using OAuth for authentication. Most of the ‘final’ code in here is copied straight from there, aside from various wrapping/unwrapping functions. It’s been very useful for learning lots of things; it’s been suggested that I ‘hide’ the fact that Twitter only gives you chunks of 20 tweets at a time by using
enumerators; when/if I get around to that, I’ll write an explanation.