Re: optimizing container pages serialization to enable streaming from Wilde, Erik on 2013-11-12 (public-ldp-wg@w3.org from November 2013)

From: Wilde, Erik <Erik.Wilde@emc.com>
Date: Tue, 12 Nov 2013 14:41:13 -0500
To: "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, "public-ldp-wg@w3.org" <public-ldp-wg@w3.org>
Message-ID: <CEA7C375.16D51%erik.wilde@emc.com>

no thanks, ashok. there certainly are use cases that require streaming,
and those (unsurprisingly) come from the heavy-duty database background of
the whole group. i was referring to the fact that most XML-based *REST
services* (such as feeds) are implemented in a non-streaming way, and in
practice are based on the assumption that you will not use them to
exchange tens or hundreds of megabytes per interaction. so i was wondering
if LDP expects to see this kind of data volume per interaction. cheers,
dret.

On 2013-11-12, 5:54 , "Ashok Malhotra" <ashok.malhotra@oracle.com> wrote:

>The XML Query and XSLT folks have long used streaming as a fundamental
>usecase.
>Do you want me to ask them for implementations that support streaming?
>All the best, Ashok
>On 11/11/2013 9:32 PM, Wilde, Erik wrote:
>> hello eric.
>>
>> On 2013-11-11, 15:25 , "Eric Prud'hommeaux" <eric@w3.org> wrote:
>>> Yup. C's code works both on S1 and S2. It just works better on S2. A
>>> non-streaming client works identically well with S1 and S2.
>> after thinking about this a little more, i am wondering how relevant the
>> optimization is to begin with. do we have any data that would tell us
>>that
>> this might be a problem? for example, while the inherently ordered XML
>>of
>> feeds would easily allow streaming parsing, i am not aware of any
>> implementation that actually does that (using SAX). instead, what
>>usually
>> happens is that implementations use DOM, which first reads the whole
>> resource, builds the internal XML tree, and then the code starts working
>> with that complete tree.
>>
>> in DOM/XML, the very fuzzy rule of thumb is that a DOM tree needs 10x as
>> much memory as the source file. i would assume for RDF there's a similar
>> rough guesstimate relating serializations and in-memory models? the
>>thing
>> is that neither feeds nor LDP are made for sharing/exchanging massive
>> amounts of data. they are loosely coupled protocols to allow easy
>>resource
>> access. given today's machines, it may be safe to assume that 100mb of
>> runtime memory consumption seems tolerable. in XML-land, that would
>> translate to a resource size of 10mb. i haven't seen many feeds
>>exceeding
>> that size: you can control by page size, and you can also control by not
>> randomly embedding everything in a feed (for example, podcasts are
>>really
>> small, because the large video files are linked and not embedded).
>>
>> just wondering: do we have any guesstimates of RDF memory requirements,
>> and do we really plan for scenarios where LDP resources are exceeding
>>the
>> resulting maximum resource sizes we might want to see?
>>
>> thanks and cheers,
>>
>> dret.
>>
>>
>
>
>

Received on Tuesday, 12 November 2013 19:41:54 UTC