Re: DOM tree in memory

>Can anyone give me an idea of how much bigger, in percentage terms, a
>DOM-XML-tree tends to be compared with the flat XML file it comes from?

I don't think that question, as you've asked it, can be answered.

The DOM itself is just an API. It does place some requirements on what data
has to be delivered, but says nothing about how that data is represented or
where it's stored. The memory-use question depends on the particular DOM
implementation you're working with, the particular document you're working
with, and how well the former is tuned for the needs of the latter. Memory
use is traded off against other issues such as performance, and each DOM
implementation picks its own balance point.

As an extreme example,  a DOM may not keep the document in memory at all --
it may swap all the data out to a database file, and only load the portions
you're actively using into memory; if it's sufficiently clever about
organizing and caching that information, some applications may never notice
the difference... while others, which don't visit the nodes in a pattern
where the cache helps them, will probably be severely impacted.

The language you're working in also imposes some overhead. If you're
working in Java, there's a certain minimum cost for an Object -- so you
have to think about whether DOM nodes are stored as objects, and if not how
you make them look like objects for the purposes of the DOM API.

I've seen in-memory DOMs that use as little as 16 bytes per node. When you
figure that they're using string pooling as well, it's unclear that these
really do take more memory than the text form of a typical document. The
downside of these is that performance tends to be highly asymmetric and
they generally have other limitations. They work well within the domain
they're designed to address, not so well outside that application space.

For that matter, there's nothing that keeps a DOM from reducing its data to
an algorithm, if it can do so... as long as all the operations have the
expected results when viewed through the DOM API, it's a DOM.


So the real answer here is that you need to decide what your application's
needs are -- size and speed,for what kinds of tasks and documents -- and
then use that to determine which DOM implementations are and aren't a good
fit for those requirements.

For some applications, in fact, the DOM is overkill. Sometimes a streaming
approach, eg SAX, works best. Sometimes a DOM subset, or a completely
custom data model, works best. Sometimes processing the XML as text works
best. There are few more thoughts on that topic at
http://www.w3.org/DOM/faq.html#SAXandDOM


But"how much memory does the DOM need" really does have to be answered with
"in which implementation, and for which document?"




"Could you direct me to the library?"
"Welcome to MIT. Which one?"
______________________________________
Joe Kesselman  / IBM Research

Received on Thursday, 28 September 2000 17:56:10 UTC