Memory Overhead

I read at some point some reference about the DOM level 1 specification, 
saying that a DOM heirarchy might take up to 4 times as much memory than 
the file it was loaded from, depending on the implementation. I completelly 
agree with that.

I made an implementation for DOM level 2 (without the events part, which 
actually is the real breakthrough of DOM 2) and i begin stress-testing it 
on a 1.7 MB file containing some 230.000 entries (I just copied and pasted 
the contents of the file, almost choking at some point XMLSpy, after i 
moved to mighty... Notepad).

Anyway, these entries "eat" about 50 MB of memory which i find pretty 
scaring, for a 1.7 MB file. I searhed my code for memory leaks or 
over-allocations, but found none.

I tried to compute the size i was expecting from DOM to get, and got very 
surprised as i saw a HUGE number of TEXT nodes (about 55.000), barely 
containing TAB characters and "/r" characters.
I know that the DOM MUST reflect the structure of the document, but these 
nodes are a pain in the lower layer and eat up huge amounts of memory.

While it might sound stupid to IMPOSE the "pretty-printing" formatting on 
the developers of the DOM, at least some things could be done to reduce the 
amount of these formatting text fields in the DOM.
The normalization method from NODE concatenates adjacent text nodes, but 
it's not the matter here.

I'm looking forward for an answer, observations or whatever you have to say 
about this.

Thank you!

__________________________________________________________________
Razvan Costea-Barlutiu
Department of Radiology,
The University of Chicago
5841 South Maryland Avenue
Chicago, Illinois 60637
Phone: (773)834-5106
E-Mail: cbrazvan@baltan.bsd.uchicago.edu

Received on Monday, 10 September 2001 10:47:23 UTC