- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 10 Jun 2008 22:09:54 -0400
- To: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
- CC: David Longley <dlongley@digitalbazaar.com>
Maybe the rest of you already knew this, but I just came to the realization that SAX-based parsers for RDFa don't have any benefits vs. DOM-based parsers as far as memory usage is concerned. The root of the issue lies with XML Literals and Plain Literals. Since these need to be tracked as you go down and back up the XHTML tree, you end up holding almost every character of the XHTML document in memory. Take this example: <body about=""> <span property="[foo:bar]" /> <!-- repeat the span above 1000 times --> </body> Since the SAX-parser can't jump around in the DOM, it doesn't know if the <body> element has a parent element that requires the XML Literal or plain literal, so it must collect both, which takes a relatively large amount of memory. The XML Literal for the <body> element ends up being a direct copy of all 1001 <span> elements. In the best implementation case for a SAX-based parser, you end up storing almost the entire XHTML document in memory... making it no less memory intensive than a DOM-based approach. So much for a small memory footprint parser. -- manu -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: DB Launches Medical Record Sales Service with Shepherd Medical http://blog.digitalbazaar.com/2008/02/24/health2trade/
Received on Wednesday, 11 June 2008 02:10:57 UTC