Re: [LDP Paging] Comparison to other techniques of pagination from Benjamin Armintor on 2014-09-17 (public-ldp-comments@w3.org from September 2014)

From: Benjamin Armintor <armintor@gmail.com>
Date: Wed, 17 Sep 2014 13:33:55 -0400
To: John Arwe <johnarwe@us.ibm.com>
Cc: Austin William Wright <aaa@bzfx.net>, kjetil@kjernsmo.net, public-ldp-comments@w3.org
Message-ID: <CADQQ8TNO1AsvSDtkX3RK5sG1B6wmtFBgyJ76k+716Tf01D-B7Q@mail.gmail.com>
Thanks for the detailed reply, John; it's much appreciated.

I think a couple of the points of the proposal vis-a-vis alternatives might
overstate it's advantages:
- etag and conditional GET are, as far as I know, available to range
requests, so the same mechanism for change detection would be available to
a hypothetical range-based approach
- The 2NN status might wreak less havoc with caches than alternatives, but
I'm not confident it will be a lot less
- I understand that RDF stores needn't provide predictable order for
triples, but don't personally follow how a such a resource's triples could
then be paged in any useful way, which seems to bring us back to content
negotiation.

But, I am also engaging in a relatively shallow effort to think about this
stuff. The ability of a server to unilaterally impose paging is one that
Range requests is not a thing an approach built around Range could do
without being pretty hostile to pre-HTTP 1.1 clients.

- Ben

On Wed, Sep 17, 2014 at 12:25 PM, John Arwe <johnarwe@us.ibm.com> wrote:

> Austin, the working group asked me to reply to your comment.  I'm the
> default RFC monger in this working group ;-)
>
>
> Based on the volume of discussion we've had in the past, which includes
> within the working group in addition to liasing with other groups such as
> the W3C TAG and the IETF HTTP working group, such a comparison is highly
> unlikely to be small enough to non-disruptively fit in the introduction/etc
> of the document.
>
> If you have some alternatives to the reasoning below not covered here,
> please share as it's conceivably new information that would alter consensus
> opinions.
>
>
>
> wrt next/prev, LDP Paging does make use of them, and in general draws
> substantial inspiration from [5005], specifically from section 3 Paged
> Feeds as the definitions (that non-normatively refer to 5005) should make
> clear.  Note that the 5005 link relation definitions are no longer the
> latest; the current normative definitions in the link relation registry [1]
> are compatible with LDP Paging's usage, although re-reading 5005 I see no
> conflicts.
>
> wrt  *-archive link relations, they constrain the archive documents that
> their target URIs identify such that their state SHOULD NOT change over
> time, which is not a constraint that the working group believes is
> appropriate for LDP Paging in-sequence pages (6.2.9) ... keep in mind too
> the 2119 definition of Should Not, versus the developer attitude of "should
> == may".  More problematically, RFC 5005 section 4's constraints include
> (again, Should) specific content (fh:archive) "elements" in the resources'
> "head sections" ... in effect, binding archive documents to Atom
> Syndication Format and hence to XML; this in turn means that for resources
> that are RDF graphs (a central concern for LDP), there is no standard
> representation format (no standardized ASF serialization exists for RDF).
>
> In both cases, reaching more deeply into 5005 and drawing a 1:1
> correspondence from (for example) feed entries to RDF triples would cause
> additional impedance mismatches.  If LDP Patch comes to fruition, then that
> might provide a good match (I'm speaking speculatively here and purely for
> myself - there have been zero working group discussions along these lines
> that I am aware of).  The idea of reconstructing a logical feed using a
> time-sequenced set of incremental patch entries seems like a natural
> application of 5005.  Agreement on an LDP Patch format has proven to be a
> stubbornly elusive goal over the lifetime of the working group, although it
> has recently made progress.
>
>
>
>
> wrt adding new Range units, various working group members have looked at
> it several times over the life of the working group; personally, I did so
> as far back as Submission-drafting time.  The primary reasons that worked
> against re-use of Range were:
>
> 1: Servers are not free to initiate paging unilaterally using Range
> requests.  The ability for the server to initiate paging as a way to manage
> server load (and as a side effect, potential attacks) is a major concern of
> the working group members.
>
> 2: RDF based resources (a focus of LDP generally) are not seen to be
> amenable to range requests that require index-based access to triples,
> absent implementation or domain-specific assumptions about underlying
> ordering.  SQL-based back ends might be amenable to "counting triples", but
> other database technologies not so.  Then there is the issue of common RDF
> implementation components like Apache Jena, that faithfully implement the
> RDF graph definition of an unordered set ... therefore providing no
> interface-level guarantees of repeatable order in model traversal
> operations or serialization operations, even if the underlying graph were
> unchanged between requests.  Requiring all implementations to impose an
> index-based ordering on triples is seen as a significant implementation
> burden.
>
> 3: The inability for clients to have any guarantees about their view of a
> paged resource's state after a traversal in which the paged resource
> changes.  LDP Paging provides a stronger guarantee in 6.2.7 for paged
> resources in the latter case than Range or 5005 would guarantee for a
> archived feed once the equivalence to RDF is established (preceding point).
>  The (my) initial proposal started off with the "no guarantees, start over"
> position of 5005, and working group members advocated for the stronger
> guarantee.
>
> 4: Non-cacheability of responses.  Existing caches would be forced to
> treat extension units as uncacheable, if and until their implementations
> were updated to support the new LDP-defined units.
>
> FWIW, if a future spec were to standardize how clients request particular
> orderings from the server, e.g. sorting of a result set, then in those
> cases index-based triple access and new units (on Range and/or on LDP
> Paging's preference) might well be specified there as well.
>
>
>
> wrt Content-Location and status code, this was an option that members of
> the working group did discuss with the W3C TAG [2],[3]and the IETF HTTP
> working group (their chair is cc'd on [3], as one example); short answer,
> there was no broad consensus on whether or not doing what you suggest is
> within HTTP, nor (if it is) that HTTP supplies an unambiguous and
> semantically correct interpretation.
>
> 1: [4] says that in the case you describe the C-L URI identifies a
> particular representation of the effective request URI.  The LDP
> established consensus that a single in-sequence page, in the general case,
> is not *the same resource* (in the sense of "state") as the paged resource.
>  We did not have consensus that the definition cited allows the server to
> respond to GET paged-resource-URI with 200 and C-L that identifies an
> in-sequence page (which, definitionally, has only a subset of the paged
> resource's state); my sense is that the working group mostly found that
> interpretation unnatural.  A client receiving a 200 response was believed
> to have every right to stop there (at that first GET), believing it has the
> *entire* state of the paged resource; this would not be true however when a
> paged resource is identified by the effective request URI and an
> in-sequence page resource is identified by the Content-Location response
> header (in the general case of the paged resource having > 1 page).
>
> 2: There is a competing mindset that says the server says what is, so 200
> + C-L of a "subset" resource is perfectly fine: clients have to know
> something about the resource they're asking for.
>
> LDP chose to specify an approach that leaves no risk of an existing client
> incorrectly believing that it has a complete representation of the state of
> the resource identified by the effective request URI when it does not,
> given existing implementations.  If consensus evolves in the wider
> community over time, then LDP Paging might be able to incorporate whatever
> optimizations become enabled, but the currently specified base should
> continue to work unchanged, even if it has to start with 303 to be safe wrt
> existing clients.  The at-risk text between 6.2.5 and 6.2.6 contains
> additional links as well.
>
>
>
> wrt RFC 5989, LDP's scope was chartered to include HTTP and RDF.  I don't
> know that anyone in the working group was deeply aware of 5989 before your
> comment.  There was no appetite for adding a requirement on RLS or SIP for
> implementations.
>
>
>
> wrt If clients have to be "paging aware", would that ...     There are
> several cases to consider, given the optional features involved.
>
> 1: If any GET request results in a 2NN response with response headers Link
> type=ldp:Page and canonical=effective request URI, then it can choose to
> retrieve the page sequence or not.  According to the 2NN draft [4], this
> would never happen with a compliant server unless the client sends an
> indication in the request that it supports 2NN responses, in keeping with
> "leaves no risk of an existing client incorrectly believing ..."
>
> 2: If any GET request results in a 303 response, the semantics of 303
> already say that a second resource than the one identified by the effective
> request URI is involved (thus: 303, not 306 or 307).  If the client chooses
> to retrieve the 303 Location response header's resource, and that response
> has response headers Link type=ldp:Page and canonical=first request's
> effective request URI, then it can choose to retrieve the page sequence or
> not.
>
> Any client can do that, on any resource.  Within the working group, a
> common supposition has been that an http client library would do this
> transparently.  If you see any "external/pre-programmed notion of what the
> resource it gets back is going to be", please point it out.  It's
> conceivable that those involved are too close to it to see some subtlety,
> but having looked again we see no such requirement.  Indeed, we see *less*
> need for outside knowledge in this approach than in some alternatives
> suggested, for example 200 + Content-Location, which is why we obtained
> consensus on it.
>
>
>
> wrt scope of applicability
>
> Indeed, we separated Paging out in part to allow its application
> independently of LDP proper.  Along the way, the language was changed so
> that it applies to more than just RDF based resources.
> Are there any particular aspects of LDP that you believe your server would
> not comply with, or is the definitional normative requirement on being an
> LDP server coupled with the size of the LDP spec simply leading you to
> assume that you're not compliant?  The bare-minimum difference between a
> compliant LDP server and a conforming HTTP server is pretty small, IIRC -
> skimming it's 4.2.1.3 etags, 4.2.1.5 default base URI, done (assuming you
> don't intend to expose LDPRs or LDPCs, but we're talking about bare-min) .
>  LDP, for example, does not require you to host RDF at all or to deal with
> containers at all.  If your question stems in part from a "follow your nose
> - oh, a different big scary spec I have to grep through in order to use
> Paging at all, how 'nice'" reaction, that is something we could clarify in
> principle.
>
> As to other groups, as mentioned above we've engaged directly with the TAG
> and IETF HTTP on certain aspects, as well as co-membership with the RDF
> working group, and we've received comments on past LDP LC drafts (which did
> include Paging at first) announced the usual way over the span of a year
> from a variety of sources including Tim Berners-Lee.  If there are specific
> communities you have in mind to solicit that we might have omitted, this is
> a perfect time to get them reading and we'd appreciate your help in
> motivating them to comment within the review period.
>
>
> conneg
>
> I think that got covered above in the context of other comments; the TAG
> (and IETF's HTTP working group) have already seen and given comments on
> 2NN.  It was one thread off the TAG discussion that led to additional uses
> (outside of LDP) for 2NN, as documented in the IETF draft.
>
>
>
> [1] http://www.iana.org/assignments/link-relations/link-relations.xml
> [2] http://lists.w3.org/Archives/Public/www-tag/2013Dec/0041.html
> [3] http://lists.w3.org/Archives/Public/www-tag/2014Jan/0013.html
> [4] http://tools.ietf.org/html/rfc7231#section-3.1.4.2
> [5005] http://tools.ietf.org/html/rfc5005
>
> Best Regards, John
>
> Voice US 845-435-9470  BluePages
> <http://w3.ibm.com/jct03019wt/bluepages/simpleSearch.wss?searchBy=Internet+address&location=All+locations&searchFor=johnarwe>
> Cloud and Smarter Infrastructure OSLC Lead
>
Received on Wednesday, 17 September 2014 17:34:24 UTC