Re: optimizing container pages serialization to enable streaming from Wilde, Erik on 2013-11-12 (public-ldp-wg@w3.org from November 2013)

From: Wilde, Erik <Erik.Wilde@emc.com>
Date: Mon, 11 Nov 2013 21:32:10 -0500
To: "Eric Prud'hommeaux" <eric@w3.org>
CC: Alexandre Bertails <bertails@w3.org>, Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <CEA6D0EC.125B1%erik.wilde@emc.com>

hello eric.

On 2013-11-11, 15:25 , "Eric Prud'hommeaux" <eric@w3.org> wrote:
>Yup. C's code works both on S1 and S2. It just works better on S2. A
>non-streaming client works identically well with S1 and S2.

after thinking about this a little more, i am wondering how relevant the
optimization is to begin with. do we have any data that would tell us that
this might be a problem? for example, while the inherently ordered XML of
feeds would easily allow streaming parsing, i am not aware of any
implementation that actually does that (using SAX). instead, what usually
happens is that implementations use DOM, which first reads the whole
resource, builds the internal XML tree, and then the code starts working
with that complete tree.

in DOM/XML, the very fuzzy rule of thumb is that a DOM tree needs 10x as
much memory as the source file. i would assume for RDF there's a similar
rough guesstimate relating serializations and in-memory models? the thing
is that neither feeds nor LDP are made for sharing/exchanging massive
amounts of data. they are loosely coupled protocols to allow easy resource
access. given today's machines, it may be safe to assume that 100mb of
runtime memory consumption seems tolerable. in XML-land, that would
translate to a resource size of 10mb. i haven't seen many feeds exceeding
that size: you can control by page size, and you can also control by not
randomly embedding everything in a feed (for example, podcasts are really
small, because the large video files are linked and not embedded).

just wondering: do we have any guesstimates of RDF memory requirements,
and do we really plan for scenarios where LDP resources are exceeding the
resulting maximum resource sizes we might want to see?

thanks and cheers,

dret.

Received on Tuesday, 12 November 2013 02:32:53 UTC