Converting xml to json with a few nice touches

During my recent outings in heavyweight programming, one of the things we needed to do was converting a large XML structure from the server to JSON object on the browser to facilitate easy manipulation/inspection.

Also, the XML from the server was not the nice kind – what I mean is that tag names were consistent – but the content was wildly inconsistent. For ex, all of the following were recd:

    <!-- different variations of a particular tag -->
    <BgSize>100,23</BgSize>
    <BgSize>0,0</BgSize>
    <BgSize>,</BgSize>

Ideally, in this case, we wanted to parse and validate the node (and all its different variations) and convert it to an X,Y pair only if it was a valid data in it. Also, a lot of these were common tags as you might expect that showed up in various different entities in the XML, so we wanted that all these rules get applied sooner centrally rather than having to deal with them at disparate places later down the stream.

The other reason was that a lot of the nodes really had structured data crammed into a single tag – which we ideally wanted parsed as a javascript object so that we could manipulate it easily

    <!-- xml data with structured content -->
    <!-- font, size, color, bold, italic-->
    <Font>Arial;Lucida,14,0x0044,True,False</Font>

So that brought up a search for the best way to convert XML to JSOn -and of course stackoverflow had a question. THe article in the answer makes for very interesting reading into all the different conditions that have to be handled. The associated script at http://goessner.net/download/prj/jsonxml/ is the solution I picked. Really not much going on below other than to use the xml2json function to convert the xml to a raw json object.

    @parseXML2Json: (xmlstr) ->
        log xmlstr
        json = $.parseJSON (xml2json $.parseXML (xmlstr))
        destObj = Utils.__parseTypesInJson(json)
        log "raw and parsed objects", json, destObj
        return destObj

But now to the more interesting part – once the xml is converted to a JSON, we need to do our magic on top of it – of applying validations and conversions. This is where the Utils.__parseTypesInJson method comes in

What we’re doing here is walking through the JSON object recursively. At each step, we keep track of the path of the xml that we have descended into so that we can check the path and based on the path, apply validations or conversions. At each step, we also need to check the type of JSOn object we’re dealing with – starting with undefined, null, string, array or object

If its a string, we further delegate to a __parseString function to convert the string to an object if needed.

    @__parseTypesInJson: (obj, path = "") ->
        if typeof obj is "undefined"
            return undefined
        else if obj is null
            return null
        else if typeof obj is "string"
            newObj =  Utils.__parseString(obj, path)
            validator = _.find Utils.CUSTOM_VALIDATORS, (v)->
                                                            v.regex.test path
            return validator.fn(newObj)  if validator?
            return newObj
        else if Object.prototype.toString.call(obj) is '[object Array]'
            destObj = (Utils.__parseTypesInJson(o, path) for o,i in obj)
            destObj = _.reject destObj,  (obj) ->
                                                obj == null
            return destObj
        else if typeof obj is "object"
            destObj = {}
            destObj[k]  = Utils.__parseTypesInJson(obj[k],  "#{path}.#{k}") for k of obj
            validator = _.find Utils.CUSTOM_VALIDATORS, (v)->
                                                            v.regex.test path
            return validator.fn(obj)  if validator?
            return destObj
        else
            return obj

At each step, once the object is formed, we see if there’s a custom validator defined in the array of custom Validators. Each validator is a regex and a callback function – if the regex matches the path, then the callback is passed the json object which it may manipulate before returning

    @CUSTOM_VALIDATORS = [ choice =
                                regex: /choice$/
                                fn: (obj)->
                                    if obj["#text"]?
                                        return obj
                                    else
                                        log "returning null"
                                        return null
                        ]

THe parseString method for completeness – you can really tweak this to your
taste and there’s nothing complicated going on in this.

    @__parseString : (str,  path) ->
        if not str?
            return str
        if _.any(Utils.SKIP_STRING_PARSING_REGEXES, (r)->
                                                    r.test path)
            log "Skipping string parsing for:" , path, str
            return  str
        if str
            if /^\d+$/.test str
                return parseInt str
            else if /^\d+,\d+$/.test str
                [first,second] = str.split(",")
                return  {"x": parseInt(first), "y": parseInt(second)}
            else if str == ','
                return null
            else if /^true$/i.test str
                return true
            else if /^false$/i.test str
                return false
            else if   /^[^,]+,\d+,(0x[0-9a-f]{0,6})?,((True|False),(True|False))?$/i.test str
                log "Matched font: ", str
                return  Utils.parseFontSpec(str)
            else
                return str
        else
            return str

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s