W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > June 2008

Re: RDFa worst case memory usage for SAX-based parsers the same as DOM-based parsers

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Wed, 11 Jun 2008 09:15:42 -0400
Message-ID: <484FCFFE.3000809@digitalbazaar.com>
To: Mark Birbeck <mark.birbeck@webbackplane.com>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, David Longley <dlongley@digitalbazaar.com>

Mark Birbeck wrote:
> This is interesting. 

hah... interesting in that it's dead wrong... you hit the nail right on
the head, Mark. :)

> But why in your example do you need to hedge on
> the fact that you might see an XML literal, since you will have
> already seen the parent element? Either it contained a request for the
> XML literal or it didn't.

I wasn't seeing the forest for the trees... the fix for this would be to
set a flag [store literals]. This flag would be passed down via the
[evaluation context] and if it is set, the plain literal and XML literal
version of the element contents should be stored and passed back up the
tree.

[store literals] would be set if:
- @property is present and @content is NOT present.

That should fix the issue. The worst case is still the same as the DOM
case, but that's only if somebody decides to put @property on the <body>
element.

I had been operating under the blind assumption that because the parser
is SAX based, there were only a few ways that you could shoot yourself
in the foot as far as memory usage was concerned. We ran a speed test
yesterday that saw memory usage get out of control very quickly and it
was because we were always storing the XML and plain literals, whether
we needed to or not.

A dumb, straight-forward implementation for SAX-based parsers results in
a rather nasty memory usage issue.

Resource usage on cell-phone and other portable devices was a concern
with librdfa and I had it in my head that we'd be using no more than
5KB-7KB of memory per nested element on average. That still holds true
for the most part, except in corner cases where people place @property
on elements that are rather large (use of "dc:description" on blog
postings comes to mind).

Not much of a real-world issue, unless you want to create an "attack"
page for cellphones that is a large document with @property on the
<body> element. Even in that case, you end up using as much memory as a
DOM-based implementation.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: DB Launches Medical Record Sales Service with Shepherd Medical
http://blog.digitalbazaar.com/2008/02/24/health2trade/
Received on Wednesday, 11 June 2008 13:16:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 11 June 2008 13:16:50 GMT