- From: <keshlam@us.ibm.com>
- Date: Thu, 28 Sep 2000 17:52:19 -0400
- To: www-dom@w3.org
>Can anyone give me an idea of how much bigger, in percentage terms, a >DOM-XML-tree tends to be compared with the flat XML file it comes from? I don't think that question, as you've asked it, can be answered. The DOM itself is just an API. It does place some requirements on what data has to be delivered, but says nothing about how that data is represented or where it's stored. The memory-use question depends on the particular DOM implementation you're working with, the particular document you're working with, and how well the former is tuned for the needs of the latter. Memory use is traded off against other issues such as performance, and each DOM implementation picks its own balance point. As an extreme example, a DOM may not keep the document in memory at all -- it may swap all the data out to a database file, and only load the portions you're actively using into memory; if it's sufficiently clever about organizing and caching that information, some applications may never notice the difference... while others, which don't visit the nodes in a pattern where the cache helps them, will probably be severely impacted. The language you're working in also imposes some overhead. If you're working in Java, there's a certain minimum cost for an Object -- so you have to think about whether DOM nodes are stored as objects, and if not how you make them look like objects for the purposes of the DOM API. I've seen in-memory DOMs that use as little as 16 bytes per node. When you figure that they're using string pooling as well, it's unclear that these really do take more memory than the text form of a typical document. The downside of these is that performance tends to be highly asymmetric and they generally have other limitations. They work well within the domain they're designed to address, not so well outside that application space. For that matter, there's nothing that keeps a DOM from reducing its data to an algorithm, if it can do so... as long as all the operations have the expected results when viewed through the DOM API, it's a DOM. So the real answer here is that you need to decide what your application's needs are -- size and speed,for what kinds of tasks and documents -- and then use that to determine which DOM implementations are and aren't a good fit for those requirements. For some applications, in fact, the DOM is overkill. Sometimes a streaming approach, eg SAX, works best. Sometimes a DOM subset, or a completely custom data model, works best. Sometimes processing the XML as text works best. There are few more thoughts on that topic at http://www.w3.org/DOM/faq.html#SAXandDOM But"how much memory does the DOM need" really does have to be answered with "in which implementation, and for which document?" "Could you direct me to the library?" "Welcome to MIT. Which one?" ______________________________________ Joe Kesselman / IBM Research
Received on Thursday, 28 September 2000 17:56:10 UTC