Re: optimizing container pages serialization to enable streaming from Andy Seaborne on 2013-11-12 (public-ldp-wg@w3.org from November 2013)

From: Andy Seaborne <andy@apache.org>
Date: Tue, 12 Nov 2013 14:06:57 +0000
To: public-ldp-wg@w3.org
Message-ID: <52823601.1040901@apache.org>
On 12/11/13 01:06, Wilde, Erik wrote:
> hello eric.
>
> On 2013-11-11, 15:25 , "Eric Prud'hommeaux" <eric@w3.org> wrote:
>> * Wilde, Erik <Erik.Wilde@emc.com> [2013-11-11 17:18-0500]
>>> sure. whatever somebody does within the permissible bound of a media type
>>> is fair game. but that optimization is not all that interesting when
>>> nobody can depend on it and/or ask for it, right?
>> No one is depending on it. They are simple able to take advantage of
>> it if it's there.
>
> ok, that sounds safe.
>
>>> how does s2 signal that c can safely go into streaming mode? or let's put
>> When the client sees the required membership triples, it knows how to
>> interpret the graph to date and the rest of the incoming network
>> stream.
>
> so the client uses a LDP-specific parser? again, i would argue that might
> be something people easily do in expert circles such as the LDP group, but
> for a large-scale protocol, you definitely should count on the vast
> majority of clients using off-the-shelf components. if the protocol has
> difficulty working when people are doing that, you have a problem. telling
> people "just write your own turtle parser" might not solve that problem.
>
>> What I outlined was already true of LDP; as soon as one saw the
>> membership triples, one could dispatch the graph and switch to
>> streaming mode. There was always going to be an incentive to serialize
>> the membership triples first in case there was a streaming client.
>
> true in principle. but hard to deploy if that means telling people that
> using off-the-shelf components will cause serious performance problems.

RDF parsers, that I'm aware of, have a stream-of-triples-out interface 
as well as graph parsing mode.  To within the vagaries of Turtle 
shortcuts like (), there is a definition of "document order" of triples.

Whether using this lower-level interface to a parser counts as a barrier 
to use in the wild can be argued both ways.  It probably not the 
obvious, and taught, way to do it and needs deeper understanding of the 
toolkit.  It does seem to be reasonable common feature though.

It's as much the storage that looses order (hash tables!) - not just 
writing.  Only when writing pretty-print style, needing to take triples 
in an order that suits the format, is it going to happen.  But as the 
storage may well yield the triples in a different order to when stored, 
(if not a plain file), order is changed in arbitrary ways.

>
>>>
>>> again, that seems like implementation guidance. in spec speak, i guess
>>> all
>>> you could do is say:
>>>
>>> - servers MAY choose to serialize (some) responses this way: ...
>>>
>>> - clients MUST NOT rely on servers serializing in the way described
>>> above.
>> Yup. C's code works both on S1 and S2. It just works better on S2. A
>> non-streaming client works identically well with S1 and S2.
>
> ok. let's just make sure we're designing the protocol with non-optimized
> clients in mind, not with optimized ones that use custom LDP components
> for standard tasks.

+1

>
> cheers,
>
> dret.
>
>
Received on Tuesday, 12 November 2013 14:07:27 UTC