Re: RDFa worst case memory usage for SAX-based parsers the same as DOM-based parsers

Hi Manu,

This is interesting. But why in your example do you need to hedge on
the fact that you might see an XML literal, since you will have
already seen the parent element? Either it contained a request for the
XML literal or it didn't.

I'm no doubt missing something. :)

Regards,

Mark

On 6/11/08, Manu Sporny <msporny@digitalbazaar.com> wrote:
>
> Maybe the rest of you already knew this, but I just came to the
> realization that SAX-based parsers for RDFa don't have any benefits vs.
> DOM-based parsers as far as memory usage is concerned.
>
> The root of the issue lies with XML Literals and Plain Literals. Since
> these need to be tracked as you go down and back up the XHTML tree, you
> end up holding almost every character of the XHTML document in memory.
>
> Take this example:
>
> <body about="">
>    <span property="[foo:bar]" />
>    <!-- repeat the span above 1000 times -->
> </body>
>
> Since the SAX-parser can't jump around in the DOM, it doesn't know if
> the <body> element has a parent element that requires the XML Literal or
> plain literal, so it must collect both, which takes a relatively large
> amount of memory. The XML Literal for the <body> element ends up being a
> direct copy of all 1001 <span> elements.
>
> In the best implementation case for a SAX-based parser, you end up
> storing almost the entire XHTML document in memory... making it no less
> memory intensive than a DOM-based approach.
>
> So much for a small memory footprint parser.
>
> -- manu
>
> --
> Manu Sporny
> President/CEO - Digital Bazaar, Inc.
> blog: DB Launches Medical Record Sales Service with Shepherd Medical
> http://blog.digitalbazaar.com/2008/02/24/health2trade/
>
>
>


-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Received on Wednesday, 11 June 2008 03:51:40 UTC