Re: optimizing container pages serialization to enable streaming

* Wilde, Erik <Erik.Wilde@emc.com> [2013-11-11 15:33-0500]
> On 2013-11-11, 11:56 , "Eric Prud'hommeaux" <eric@w3.org> wrote:
> >This isn't breaking layers between HTTP and payload. Browsers set the
> >title of an HTTP frame before parsing it because the head precedes the
> >body. Most XML protocols impose some ordering. For a quick survey, see
> ><http://www.w3.org/2000/03/29-XML-protocol-matrix>.
> 
> as you certainly know, XML has an ordered metamodel, which of course makes
> it easier to implement incremental parsing and even rendering, if you want
> that. and all XML-based protocols i know use XML's built-in ordering,
> because it makes a lot of sense to reuse that aspect of the metamodel.
> 
> atom *could have* said entry order in the XML does not matter and instead,
> each /feed/entry has an @order attribute and then, <entry order="1"> could
> be served last, and clients would need to order entries by that attribute.
> that of course would be a rather strange XML model (and i've never seen
> anybody doing it). with RDF not having an inherent order, that's pretty
> much the option you're left with for RDF models that use ordering, with
> all the not-so-great side effects. there's little there LDP can actually
> do short of (as i think, unduly) constraining RDF.
> 
> coming back to alexandre's point: you could imagine a world in which
> there's text/turtle, and there's *also* text/ordered-turtle, where some
> ordering constraints on the RDF model level filter down into the RDF
> serialization (all of them or just turtle?). but if LDP implementors then
> have to live with existing serializers in their tool chain, they would end
> up *only* supporting text/turtle, unless you would require that LDP must
> *only* use text/ordered-turtle.

The odds that someone who cared about optimizing would use a generic
serializer are vanishingly small. I'd say that's not a significant
price. The payload is still text/turtle; no text/turtle parser would
be able to detect any way that it differs from any rearrangement of
the same triples.


> you could use HTTP conneg for clients to prefer text/ordered-urtle but
> still accept text/turtle (because of LDP servers not implementing the new
> RDF serialization method). that works, but as any protocol, unless you
> make it very clear that both media types MUST be supported, you will most
> certainly end up with implementations that implement a subset (because
> that was all they were interested in), and then end up being
> non-interoperable against implementations implementing a disjoint subset.
> 
> as alexandre pointed out: it's doable and it has been done before (in many
> protocols out there, based on all kinds of metamodels). it just has
> important implications, and is not a choice that should be made without
> considering all of them, given the decades of protocol design experience
> that's out there.

Server S1 serializes some LDPC using a generic serializer. Client C
parses the data, recording each arc in an RDF graph, until it sees the
membership predicate. At that point, it scans back through the graph
so far, looking for the membership predicate and acting on each member
now that it can find them. It continues to parse the network input,
now in a streaming mode and able to dispatch each member as it arrives
from the network. Client C has broken no LDP, HTTP, RDF or Turtle
rules; it has only optimized for the part of the document that follows
the membership predicate.

Server S2 uses a custom serializer which starts out by emitting the
membership predicate. Client C can consume S2's data much more
efficiently becaue it can start out in streaming mode. S2 hasn't
broken any rules, but it is nonetheless, a much more efficient server
for large collections.

I proposed that we say there is exactly one ldp:membershipRules arc
from the container to the node with all the membership predicate et
al. That doesn't break any LDP, HTTP, RDF or Turtle rules. Perhaps
this will meet with less resistance if we simply don't mention that
serializing that arc at the top of the document will enable more
efficient streaming parsers. We can let people figure it out for
themselves.


> cheers,
> 
> dret.
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Monday, 11 November 2013 22:02:15 UTC