Skip to content

MemoryManagementThreadSafety

robbiehanson edited this page Nov 3, 2011 · 1 revision

Memory Management and Thread-Safety Information

Introduction

KissXML is designed to be read-access thread-safe. It is NOT write-access thread-safe as this would require significant overhead. (Every thread would have to acquire a mutex or go through the same dispatch_queue, etc.)

What exactly does read-access thread-safe mean? It means that multiple threads can safely read from the same xml structure, so long as none of them attempt to alter the xml structure (add/remove nodes, change attributes, etc).

This read-access thread-safety includes parsed xml structures as well as xml structures created by you. Let's walk through a few examples to get a deeper understanding of the benefits of read-access thread-safety.

Example #1 - Parallel processing of children

DDXMLElement *root = [[[DDXMLElement alloc] initWithXMLString:str error:nil] autorelease];
NSArray *children = [root children];

dispatch_queue_t q = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply([children count], q, ^(size_t i) {
    DDXMLElement *child = [children objectAtIndex:i];
    <process child>
});

Example #2 - Sub-element processing

DDXMLElement *root = [[DDXMLElement alloc] initWithXMLString:str error:nil];
DDXMLElement *child = [root elementForName:@"starbucks"];

dispatch_async(queue, ^{ // <--- Async operation, might occur after release statement below
    <process child>
});

[root release]; // Not a problem for KissXML. It just works.

Three Important Concepts

1. KissXML provides a light-weight wrapper around libxml

The parsing, creation, storage, etc of the xml tree is all done via libxml. This is a fast low-level C library that's been around for ages, and comes pre-installed on Mac OS X and iOS. KissXML provides an easy-to-use Objective-C library atop libxml.

So a DDXMLNode, DDXMLElement, or DDXMLDocument are simply objective-c objects with pointers to the underlying libxml C structure. The only time you need to be aware of any of this is when it comes to equality. In order to maximize speed and provide read-access thread-safety, the library may create multiple DDXML wrapper objects that point to the same underlying xml node.

So don't assume you can test for equality with "==". Instead use the isEqual method (as you should generally do with objects anyway).

2. The XML API's are designed to allow traversal up & down the xml tree hierarchy

The tree hierarchy and API contract have an implicit impact concerning memory management.

<starbucks>
   <latte/>
</starbucks>

Imagine you have a DDXMLNode corresponding to the starbucks node, and you have a DDXMLNode corresponding to the latte node. Now imagine you release the starbucks node, but you retain a reference to the latte node. What happens?

Well the latte node is a part of the xml tree hierarchy. So if the latte node is still around, the xml tree hierarchy must stick around as well. So even though the DDXMLNode (wrapper object) corresponding to the starbucks node may get deallocated, the underlying xml tree structure won't be freed until the latte node gets dealloacated.

This provides thread-safety when reading and processing a tree. It also means the xml library follows the standard objective-c memory management model. If you traverse a tree and fork off asynchronous tasks to process sub-nodes, the tree will remain properly in place until all your asynchronous tasks have completed. In other words, it just works.

However, if you parse a huge document into memory, and retain a single node from the giant xml tree... Well you should see the problem this creates. Instead, in this situation, copy or detach the node if you want to keep it around. Or just extract the info you need from it.

3. KissXML is read-access thread-safe, but write-access thread-unsafe (designed for speed)

<starbucks>
   <latte/>
</starbucks>

Imagine you have a DDXMLNode corresponding to the starbucks node, and you have a DDXMLNode corresponding to the latte node. What happens if you invoke [starbucks removeChildAtIndex:0]?

Well the underlying xml tree will remove the latte node, and release the associated memory. And what if you still have a reference to the DDXMLNode that corresponds to the latte node? Well the short answer is that you shouldn't use it.

This is pretty obvious when you think about it from the context of this simple example. But in the real world, you might have multiple threads running in parallel, and you might accidentally modify a node while another thread is processing it.

To completely fix this problem, and provide write-access thread-safety, would require extensive overhead. This overhead is completely unwanted in the majority of cases. Most XML usage patterns are heavily read-only. And in the case of xml creation or modification, it is generally done on the same thread. Thus the KissXML library is write-access thread-unsafe, but provides speedier performance.

However, when such a bug does creep up, it produces horrible side-effects. Luckily the library comes with tools to help track these problems down.

Debugging Memory Issues

Each DDXML object is simply an Objective-C object with a pointer to the underlying libxml node. If the memory rules listed above are violated then the underlying libxml pointer becomes a dangling pointer. These are difficult bugs to track down because one doesn't know when that memory address will get reused.

- (void)stupidMethod
{
    NSXMLElement *blogPosts = [self.xmlDocument rootElement];
    NSXMLElement *mostRecentBlogPost = [[blogPosts elementsForName:@"post"] lastObject];

    dispatch_async(concurrentQueue, ^{ // <- Async operation, might occur after code below
        NSString *post = [self contentOfBlogPost:mostRecentBlogPost];
        NSLog(@"post: %@", post);
    });
    
    [blogPosts setChildren:nil]; // Delete all blog posts
}

What will happen when the trouble code executes above? That depends on a number of things. It might:

  • Work absolutely fine
  • Print out some random junk value
  • Crash

Or even worse, what if a similar situation occurred, but instead of reading data from the xml element (mostRecentBlogPost), you attempt to edit the element. (setStringValue, addChild, etc). It might:

  • Work absolutely fine
  • Corrupt the heap
  • Crash

And heap corruption is an absolute &%$#@ to track down. Because instead of a helpful exception or crash, the code went off and altered some important variable in a random object or structure somewhere else in your memory stack. And this won't cause any problems until later, when that object or structure causes unexpected results or a crash.

If you suspect there's a bug somewhere in your xml handling code, there is a debugging macro you can enable to help you track down these types of problems. The macro is defined in DDXML.h

// Set to 1 to enable
// Set to 0 to disable (this is the default)
// 
#define DDXML_DEBUG_MEMORY_ISSUES 1

This macro will enable extra code that keeps track of DDXML objects, and marks any that become dangling pointers when the underlying libxml structures are freed. This way if you attempt to to read information from the DDXML object, or you attempt to edit the structure, the code will immediately throw a helpful exception for you. When the exception is thrown you can then backtrack and discover where the problem is occurring.

- (void)stupidMethod
{
    NSXMLElement *blogPosts = [self.xmlDocument rootElement];
    NSXMLElement *mostRecentBlogPost = [[blogPosts elementsForName:@"post"] lastObject];

    dispatch_async(concurrentQueue, ^{
        NSString *post = [self contentOfBlogPost:mostRecentBlogPost]; // <-- Throws an exception! :)
        NSLog(@"post: %@", post);
    });
    
    [blogPosts setChildren:nil];
}

Keep in mind that the DDXML_DEBUG_MEMORY_ISSUES option is for debugging only. (Duh, right?) You should disable it for production code as it adds significantly overhead and slows down the library.