For the past year or so, I’ve been very lucky. All the data I’ve had to deal with has been packaged in JSON, not XML. And what a glorious year it’s been. Instead of writing complex, single-use XML-parsing code, I’ve had the joy of using Stig Brautaset’s excellent JSON-framework to parse JSON. The framework is dead simple to use. Have some JSON? Throw some JSON at the framework, and viola! You get back an NSDictionary or an NSArray. Just one line of code, and you’re done. Simple, elegant, and completely opposite to the experience of parsing XML.

What’s the problem parsing XML? Well, first you have to set up your NSXMLParser. Then, make sure you’re set as the delegate. Then, override the necessary delegate methods (there are 14 relevant NSXMLParserDelegate methods, so choose wisely!). Then, initialize yourNSMutableString to record the strings from the text nodes. Then, initialize your model objects from the XML as elements are pushed and popped in the didStart and didEnd methods. And don’t forget to update your objects with data in the attributes dictionary. And so on, and so on.

XML files, especially small XML files, should be as easy to parse as JSON.

Last week, when faced with the depressing task of writing my first XML parser this year, I wrote a general-purpose XML to NSDictionary parser instead. Throw some XML at it, and it spits out anNSDictionary.

How does it work? Here are the key ideas:

  1. XML elements map to key names in the dictionary
  2. Each element corresponds to a child dictionary
  3. Attribute key-value pairs are added to the element’s child dictionary
  4. Strings from text nodes are assigned to the child dictionary’s “text” key
  5. If an element name is encountered multiple times, the value of the element is set to an array of children dictionaries

This conversion is not without its flaws, but it should work pretty well for most XML files.

The Code

The parser consists of a single class, XMLReader. You can either pass it an XML string or an XML data object, and it will return the NSDictionary version of the XML. If the XML is malformed or the parser fails for any other reason, the NSError pointer you pass in will be populated with anNSError object.

Here’s the header file:

//
// XMLReader.h
//

#import <Foundation/Foundation.h>

@interface XMLReader : NSObject
{
NSMutableArray *dictionaryStack;
NSMutableString *textInProgress;
NSError **errorPointer;
}

+ (NSDictionary *)dictionaryForXMLData:(NSData *)data error:(NSError **)errorPointer;
+ (NSDictionary *)dictionaryForXMLString:(NSString *)string error:(NSError **)errorPointer;

@end

And the implementation:

//
// XMLReader.m
//

#import "XMLReader.h"

NSString *const kXMLReaderTextNodeKey = @"text";

@interface XMLReader (Internal)

- (id)initWithError:(NSError **)error;
- (NSDictionary *)objectWithData:(NSData *)data;

@end

@implementation XMLReader

#pragma mark -
#pragma mark Public methods

+ (NSDictionary *)dictionaryForXMLData:(NSData *)data error:(NSError **)error
{
XMLReader *reader = [[XMLReader allocinitWithError:error];
NSDictionary *rootDictionary = [reader objectWithData:data];
[reader release];
return rootDictionary;
}

+ (NSDictionary *)dictionaryForXMLString:(NSString *)string error:(NSError **)error
{
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
return [XMLReader dictionaryForXMLData:data error:error];
}

#pragma mark -
#pragma mark Parsing

- (id)initWithError:(NSError **)error
{
if (self = [super init])
{
errorPointer = error;
}
return self;
}

- (void)dealloc
{
[dictionaryStack release];
[textInProgress release];
[super dealloc];
}

- (NSDictionary *)objectWithData:(NSData *)data
{
// Clear out any old data
[dictionaryStack release];
[textInProgress release];

dictionaryStack = [[NSMutableArray allocinit];
textInProgress = [[NSMutableString allocinit];

// Initialize the stack with a fresh dictionary
[dictionaryStack addObject:[NSMutableDictionary dictionary]];

// Parse the XML
NSXMLParser *parser = [[NSXMLParser allocinitWithData:data];
parser.delegate = self;
BOOL success = [parser parse];

// Return the stack’s root dictionary on success
if (success)
{
NSDictionary *resultDict = [dictionaryStack objectAtIndex:0];
return resultDict;
}

return nil;
}

#pragma mark -
#pragma mark NSXMLParserDelegate methods

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
// Get the dictionary for the current level in the stack
NSMutableDictionary *parentDict = [dictionaryStack lastObject];

// Create the child dictionary for the new element, and initilaize it with the attributes
NSMutableDictionary *childDict = [NSMutableDictionary dictionary];
[childDict addEntriesFromDictionary:attributeDict];

// If there’s already an item for this key, it means we need to create an array
id existingValue = [parentDict objectForKey:elementName];
if (existingValue)
{
NSMutableArray *array = nil;
if ([existingValue isKindOfClass:[NSMutableArray class]])
{
// The array exists, so use it
array = (NSMutableArray *) existingValue;
}
else
{
// Create an array if it doesn’t exist
array = [NSMutableArray array];
[array addObject:existingValue];

// Replace the child dictionary with an array of children dictionaries
[parentDict setObject:array forKey:elementName];
}

// Add the new child dictionary to the array
[array addObject:childDict];
}
else
{
// No existing value, so update the dictionary
[parentDict setObject:childDict forKey:elementName];
}

// Update the stack
[dictionaryStack addObject:childDict];
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
// Update the parent dict with text info
NSMutableDictionary *dictInProgress = [dictionaryStack lastObject];

// Set the text property
if ([textInProgress length> 0)
{
// Get rid of leading + trailing whitespace
[dictInProgress setObject:textInProgress forKey:kXMLReaderTextNodeKey];

// Reset the text
[textInProgress release];
textInProgress = [[NSMutableString allocinit];
}

// Pop the current dict
[dictionaryStack removeLastObject];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
// Build the text value
[textInProgress appendString:string];
}

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError
{
// Set the error pointer to the parser’s error object
*errorPointer = parseError;
}

@end

The code works by keeping a stack of dictionaries, one for each level of the XML file. Each time a new tag is encountered, a child dictionary is pushed onto the stack. Each time a tag is closed, the dictionary is popped off the stack.

Arrays of elements are detected when the same key appears twice in the dictionary. For instance, if the XML is “<book><page>1</page><page>2</page></book>”, the first time the “page” element is encountered, a child dictionary will be set as the value for the “page” key. The next time the “page” element is encountered, we detect that there’s already a value for the “page” key, so we put both pages in an array and set the value of the “page” key to the array.

Note: One side effect of detecting arrays in this fashion is that the value for a key (say “page” in the example above), can sometimes be set to an NSArray but other times be set to an NSDictionary. For example, if the book contains a single page, “page” will be set to an NSDictionary. If the book contains 2 or more pages, “page” will be set to an NSArray. You will need to account for this when reading from the dictionary produced by dictionaryFromXMLString:error: and dictionaryFromXMLData:error:.

When the parser comes across a text node in the XML, it inserts a key into the dictionary named “text” and sets the value to the parsed string.

Note: Make sure the XML you’re parsing doesn’t contain a field named “text”! You can change it to a non-conflicting name by editing the kXMLReaderTextNodeKey constant at the top of XMLReader.m.

Using the Code

The conversion is best illustrated by an example. The snippet below defines an XML string that is converted into a dictionary using XMLReader:

//
// XML string from http://labs.adobe.com/technologies/spry/samples/data_region/NestedXMLDataSample.html
//
NSString *testXMLString = @"<items><item id=\"0001\" type=\"donut\"><name>Cake</name><ppu>0.55</ppu><batters><batter id=\"1001\">Regular</batter><batter id=\"1002\">Chocolate</batter><batter id=\"1003\">Blueberry</batter></batters><topping id=\"5001\">None</topping><topping id=\"5002\">Glazed</topping><topping id=\"5005\">Sugar</topping></item></items>";

// Parse the XML into a dictionary
NSError *parseError = nil;
NSDictionary *xmlDictionary = [XMLReader dictionaryForXMLString:testXMLString error:&parseError];

// Print the dictionary
NSLog(@"%@"xmlDictionary);

//
// testXMLString = 
//    <items>
//        <item id=”0001? type=”donut”>
//            <name>Cake</name>
//            <ppu>0.55</ppu>
//            <batters>
//                <batter id=”1001?>Regular</batter>
//                <batter id=”1002?>Chocolate</batter>
//                <batter id=”1003?>Blueberry</batter>
//            </batters>
//            <topping id=”5001?>None</topping>
//            <topping id=”5002?>Glazed</topping>
//            <topping id=”5005?>Sugar</topping>
//        </item>
//    </items>
//
// is converted into
//
// xmlDictionary = {
//    items = {
//        item = {
//            id = 0001;
//            type = donut;
//            name = {
//                text = Cake;
//            };
//            ppu = {
//                text = 0.55;
//            };
//            batters = {
//                batter = (
//                    {
//                        id = 1001;
//                        text = Regular;
//                    },
//                    {
//                        id = 1002;
//                        text = Chocolate;
//                    },
//                    {
//                        id = 1003;
//                        text = Blueberry;
//                    }
//                );
//            };
//            topping = (
//                {
//                    id = 5001;
//                    text = None;
//                },
//                {
//                    id = 5002;
//                    text = Glazed;
//                },
//                {
//                    id = 5005;
//                    text = Sugar;
//                }
//            );
//        };
//     };
// }
//

The mapping between XML and NSDictionary is shown in the comments above. Notice how the “batter” and “topping” keys are set to arrays since there were multiple “batter” and “topping” keys in the XML. Also, note how the attributes for elements are available in the element’s child dictionary. For instance, the “id” and “type” attributes for item “item” are keys in the “item” dictionary.

Download

You can download the XMLReader files here:

XMLReader.zip

Done and Done

XMLReader isn’t perfect by any stretch of the imagination, but hopefully it helps save you some pain and misery next time you need to parse an XML file in Objective-C.