Re: optimizing container pages serialization to enable streaming

* John Arwe <johnarwe@us.ibm.com> [2013-11-15 09:44-0500]
> Nothing like hitting Send and resuming a suspended thread to elicit the 
> missing comment.  New version here.  No changes to 1-3
> 
> > Wow, I time-slice onto something(s actually) else for a couple days 
> > and we have real discussions going on! 
> 
> > 1: Dumb question, for Eric P I think.  If one is willing to assume 
> > an optimized serializer in order to move the proposed 
> > ldp:membershipRule "up front", what does the extra level of 
> > indirection and the anon  bnode "trick" really give you beyond 
> > moving the 3-4 predicates that the anon  bnode would otherwise 
> > contain up front?   The mental chasm for me is accepting the notion 
> > of an optimized serializer more than whether it's moving a clump of 
> > form X vs form Y to a privileged spot. 

The incentive for an optimized serializer to benefit streaming parsers
exists regardless of the membershipRules proposal. The membershipRules
proposal merely enables terse defaults in conjunction with a streaming
parser.

The proposal leans on an artifact of a few RDF serialization formats
(including Turtle) that enables one to create a "anonymous" blank
node, which is one that, because it has no label, can never be
referenced later.

If ldp:tripleRules has a cardinality of one, and we're not using
entailment, after I see
  <R1> ldp:tripleRules [ ldp:membershipPredicate my:mineMineMine ].
I know that I won't see another ldp:tripleRules. In the case that this
uses an anonymous blank node ([...] in Turtle and Trig, nested
rdf:Description in RDF/XML), I also know that there won't be any more
triples with that blank node as a subject ('cause it has no label).


> > 2: To ErikW's comment, maybe it's fine for prototyping but I'm 
> > worried about lifecycle dev costs in product development, and once 
> > we create that Thing we own its care & feeding forever.  I would 
> > "strongly discourage" my own devs from taking a course like that 
> > especially in the general case. 

I've tried in several messages to find out in what way ErikW imagines
a non-streaming parser breaking when getting data optimized for a
streaming parser. I think the concearn comes from not examining the
mechanics. I'll spell them out in detail, using the same services as
in <http://www.w3.org/mid/20131111220142.GC29738@w3.org>.

If a client is consuming large container pages across a long, thin
pipe, there is an obvious incentive to tune the parser to dispatch
members as it parses them. As soon as it's seen the membership
triples, it can go back over its accumulated graph, interpret and
dispatch the already recieved members, and dispatch the remaining
members as they come in. Let's call that a streaming client CS. By
contrast, the non-streaming client C0 synchronously parses the whole
payload and then interprets and dispatches the members.

Server S1 writes the membership triples randomly in the payload.  C0
synchronously parses and dispatches that data with no variation in
efficiency based on the ordering. CS finds the membership triples on
average about halfway through the message so it produces an improved
responsiveness by beginning dispatching afer seeing about half a
payload.

S2 manually writes the membership triples at the top. The performance
for C0 remains the same as for S1, but CS's lag has effectively been
reduced to the time it takes the first packet to arrive.

For some reason, this gives ErikW concearn. Note that we haven't
gotten to the ldp:membershipRules proposal yet. This is just the
natural behavior of servers and clients responding to pressures
for relatively trivial optimization.

All the ldp:membershipRules adds to this is a terse way for the
service to use default membership triples in Turtle, Trig and
RFD/XML by the syntactic mechanism above.


> > 3: Presenting the option for server implementations seems perfectly 
> > appropriate in a companion document; one of the existing ones like 
> > BP&G, or another.  As long as it meets the "clients can't depend on 
> > it" criteria others laid out, it's a fair investment decision to 
> > give server implementers.  I have zero sympathy for it in the 
> > mainline spec, where as ErikW pointed out it would add nothing 
> > normative; I think it would be a distraction, if anything. 

I completely agree that "clients can't depend on it" needs to be
spelled out in strong, potentially abusive language. I don't have
much pref for where it gets written, or even if the possibility of
optimization gets mentioned at all. Like your observation that a
client could technically be compliant by setting last=next, people
will figure it out.

My goal here is to clear the FUD and evaluate the ldp:membershipRules
proposal based on facts. Here are some optimization facts about LDP
without the proposal:

  1 It doesn't change the behavior of non-streaming parsers at all.
  2 It doesn't offer contracts beyond that of a Turtle payload.
  3 The client doesn't need to know that the membership triples
    will be at the top in order to take advantage of them if they are.
  4 It does rely on an artifact of the less primative RDF serializations.

And here are some facts about the ldp:membershipRules proposal:

  A It doesn't work with the simple language N-Triples becuase it
    has not syntax for anonymous bnodes.
  B It can enable early streaming with a terse serialization burden
    for default membership triples.


> 4: Eric P, your ldp:membershipRule proposal I think relies on the
> "exactly one" phrase.  When Henry raised the prospect of this blank
> node pattern in earlier discussions, he allowed for >1 - would that not
> break your streaming proposal's required advantage?  I see, reading 
> another branch, that the bottom of Henry's 
> http://www.w3.org/2012/ldp/wiki/MembershipInferencing
> renders this explicit (again).

There's no advantage to allowing >1 ldp:membershipRulez, and there are
lots of disadvantages, e.g.

  Harder to validate with SPARQL or graph API -- it must examine each
  membershipRules and make sure that it has no overlapping container
  triples, e.g. one setting the membershipPredicate to X and another
  to Y.

  More specification to define what to do for conflicts.

  More confusing for readers of the page or of the LDP specs.


> Best Regards, John
> 
> Voice US 845-435-9470  BluePages
> Tivoli OSLC Lead - Show me the Scenario
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Friday, 15 November 2013 23:04:54 UTC