Climbmania: Difference between SAX and DOM?

在我為數不多的面試經驗裡，這個問題出現的比率高的嚇人。

轉錄一下來自W3C官方的解答:

What is the relationship between the DOM and SAX?

DOM and SAX are currently the two most popular APIs for manipulating XML documents. They differ significantly in provenance, in scope, and in programming style. They are not in direct competition with each other; each has strengths and weaknesses.

Provenance:Unlike the DOM, SAX ("Simple API for XML") is not being developed by the World Wide Web Consortium. Instead, it was developed by an informal group of participants of the XML-DEV mailing list. SAX 1 has been fairly widely supported by providers of XML processing software. SAX 2, now being developed, is not yet widely supported, and at this writing diverges significantly from SAX 1, though it includes the SAX 1 APIs for backward-compatibility purposes.

Scope:SAX was originally designed specifically as an API for XML parsers. As such, it includes functions which won't be supported in the DOM until DOM Level 3's Load/Save module is released. On the other hand, SAX 1 discarded some information (such as comments) that the DOM retains. As with the DOM, later versions of SAX are working to improve their coverage of the XML Information Set.

Style:The most important difference between SAX and DOM is that SAX presents the document as a serialized "event stream" (a sequence of calls to a handler function as each chunk of XML syntax is recognized) rather than the DOM's tree. A major disadvantage of this approach is that SAX does not support random-access manipulation of the document -- you see the tokens once, in document order, and that's it. If you might want to refer back to anything you saw earlier, it's your code's responsibility to store that information so you can retrieve it. But this requirement that you invent your own document model means you can decide to discard information that will not be needed, which can result in reduced memory overhead versus retaining a complete model such as the DOM.

However, depending on the task you're trying to perform, SAX may not always have a storage-size advantage. And one should remember that DOM implementations vary in their memory requirements, just as they do in code size and performance. Some are more compact than others, and some do not keep the whole document in main memory at once.

So how should you decide between SAX events and DOM trees? If you intend to allow other code, such as utility routines, middleware, applications, and scripts within the document, to explore and possibly alter the document's contents, the DOM is almost certainly the way to go; it provides a W3C-standardized, complete, and editable view of the document's contents. Conversely, if your task processes the document on a straight-line flow-through basis, without permitting users to write "scripts" against it and without needing much contextual information at each stage -- for example, if you're parsing the XML document directly into a database for storage -- SAX may provide a more direct interface to the parser. Between these extremes, it's a judgement call; you have to think about how much trouble it will be to implement your own document model versus using the DOM, and about how you expect your application to grow in the future.

In fact, it is possible to combine SAX and DOM within a single system. Many parsers can produce both SAX and DOM output, and some have borrowed SAX's parser-control calls as a stopgap while they wait for the DOM 3 Load/Save API to be defined. Despite the slight "impedance mismatch" between them, it is not uncommon to use a SAX stream as input to a DOM builder, or to use a DOM's contents to generate SAX events; code for both these operations is widely available. There are also some new APIs being developed which combine SAX and DOM in interesting ways, for example returning DOM nodes from SAX operations.

Climbmania

Tuesday, March 10, 2009

Difference between SAX and DOM?

2 comments:

About Me

文章分類

Blog Archive