Re: [LDP Paging] Comparison to other techniques of pagination from Ashok Malhotra on 2014-09-17 (public-ldp-comments@w3.org from September 2014)

From: Ashok Malhotra <ashok.malhotra@oracle.com>
Date: Wed, 17 Sep 2014 12:40:44 -0400
To: John Arwe <johnarwe@us.ibm.com>, Austin William Wright <aaa@bzfx.net>
CC: kjetil@kjernsmo.net, public-ldp-comments@w3.org
Message-ID: <5419B98C.7010709@oracle.com>
We had an internal Oracle discussion about paging as part of which we evaluated
paging designs from various APIs on the Web.  We did not come up with a design
that is significantly different from the design in the LDP Paging Spec.

The hard issue wrt paging is what to do when one user is paging thru a collections
and others are updating it.  The WG, perhaps wisely, decided not to go there except
to mention that this can be a problem and users can detect whether a page or collection
has changed using eTags and reinitiate paging if they so desire.
All the best, Ashok

On 9/17/2014 12:25 PM, John Arwe wrote:
> Austin, the working group asked me to reply to your comment.  I'm the default RFC monger in this working group ;-)
>
>
> Based on the volume of discussion we've had in the past, which includes within the working group in addition to liasing with other groups such as the W3C TAG and the IETF HTTP working group, such a comparison is highly unlikely to be small enough to non-disruptively fit in the introduction/etc of the document.
>
> If you have some alternatives to the reasoning below not covered here, please share as it's conceivably new information that would alter consensus opinions.
>
>
>
> wrt next/prev, LDP Paging does make use of them, and in general draws substantial inspiration from [5005], specifically from section 3 Paged Feeds as the definitions (that non-normatively refer to 5005) should make clear.  Note that the 5005 link relation definitions are no longer the latest; the current normative definitions in the link relation registry [1] are compatible with LDP Paging's usage, although re-reading 5005 I see no conflicts.
>
> wrt  *-archive link relations, they constrain the archive documents that their target URIs identify such that their state SHOULD NOT change over time, which is not a constraint that the working group believes is appropriate for LDP Paging in-sequence pages (6.2.9) ... keep in mind too the 2119 definition of Should Not, versus the developer attitude of "should == may".  More problematically, RFC 5005 section 4's constraints include (again, Should) specific content (fh:archive) "elements" in the resources' "head sections" ... in effect, binding archive documents to Atom Syndication Format and hence to XML; this in turn means that for resources that are RDF graphs (a central concern for LDP), there is no standard representation format (no standardized ASF serialization exists for RDF).
>
> In both cases, reaching more deeply into 5005 and drawing a 1:1 correspondence from (for example) feed entries to RDF triples would cause additional impedance mismatches.  If LDP Patch comes to fruition, then that might provide a good match (I'm speaking speculatively here and purely for myself - there have been zero working group discussions along these lines that I am aware of).  The idea of reconstructing a logical feed using a time-sequenced set of incremental patch entries seems like a natural application of 5005.  Agreement on an LDP Patch format has proven to be a stubbornly elusive goal over the lifetime of the working group, although it has recently made progress.
>
>
>
>
> wrt adding new Range units, various working group members have looked at it several times over the life of the working group; personally, I did so as far back as Submission-drafting time.  The primary reasons that worked against re-use of Range were:
>
> 1: Servers are not free to initiate paging unilaterally using Range requests.  The ability for the server to initiate paging as a way to manage server load (and as a side effect, potential attacks) is a major concern of the working group members.
>
> 2: RDF based resources (a focus of LDP generally) are not seen to be amenable to range requests that require index-based access to triples, absent implementation or domain-specific assumptions about underlying ordering.  SQL-based back ends might be amenable to "counting triples", but other database technologies not so.  Then there is the issue of common RDF implementation components like Apache Jena, that faithfully implement the RDF graph definition of an unordered set ... therefore providing no interface-level guarantees of repeatable order in model traversal operations or serialization operations, even if the underlying graph were unchanged between requests.  Requiring all implementations to impose an index-based ordering on triples is seen as a significant implementation burden.
>
> 3: The inability for clients to have any guarantees about their view of a paged resource's state after a traversal in which the paged resource changes.  LDP Paging provides a stronger guarantee in 6.2.7 for paged resources in the latter case than Range or 5005 would guarantee for a archived feed once the equivalence to RDF is established (preceding point).  The (my) initial proposal started off with the "no guarantees, start over" position of 5005, and working group members advocated for the stronger guarantee.
>
> 4: Non-cacheability of responses.  Existing caches would be forced to treat extension units as uncacheable, if and until their implementations were updated to support the new LDP-defined units.
>
> FWIW, if a future spec were to standardize how clients request particular orderings from the server, e.g. sorting of a result set, then in those cases index-based triple access and new units (on Range and/or on LDP Paging's preference) might well be specified there as well.
>
>
>
> wrt Content-Location and status code, this was an option that members of the working group did discuss with the W3C TAG [2],[3]and the IETF HTTP working group (their chair is cc'd on [3], as one example); short answer, there was no broad consensus on whether or not doing what you suggest is within HTTP, nor (if it is) that HTTP supplies an unambiguous and semantically correct interpretation.
>
> 1: [4] says that in the case you describe the C-L URI identifies a particular representation of the effective request URI.  The LDP established consensus that a single in-sequence page, in the general case, is not *the same resource* (in the sense of "state") as the paged resource.  We did not have consensus that the definition cited allows the server to respond to GET paged-resource-URI with 200 and C-L that identifies an in-sequence page (which, definitionally, has only a subset of the paged resource's state); my sense is that the working group mostly found that interpretation unnatural.  A client receiving a 200 response was believed to have every right to stop there (at that first GET), believing it has the *entire* state of the paged resource; this would not be true however when a paged resource is identified by the effective request URI and an in-sequence page resource is identified by the Content-Location response header (in the general case of the paged resource 
> having > 1 page).
>
> 2: There is a competing mindset that says the server says what is, so 200 + C-L of a "subset" resource is perfectly fine: clients have to know something about the resource they're asking for.
>
> LDP chose to specify an approach that leaves no risk of an existing client incorrectly believing that it has a complete representation of the state of the resource identified by the effective request URI when it does not, given existing implementations.  If consensus evolves in the wider community over time, then LDP Paging might be able to incorporate whatever optimizations become enabled, but the currently specified base should continue to work unchanged, even if it has to start with 303 to be safe wrt existing clients.  The at-risk text between 6.2.5 and 6.2.6 contains additional links as well.
>
>
>
> wrt RFC 5989, LDP's scope was chartered to include HTTP and RDF.  I don't know that anyone in the working group was deeply aware of 5989 before your comment.  There was no appetite for adding a requirement on RLS or SIP for implementations.
>
>
>
> wrt If clients have to be "paging aware", would that ...     There are several cases to consider, given the optional features involved.
>
> 1: If any GET request results in a 2NN response with response headers Link type=ldp:Page and canonical=effective request URI, then it can choose to retrieve the page sequence or not.  According to the 2NN draft [4], this would never happen with a compliant server unless the client sends an indication in the request that it supports 2NN responses, in keeping with "leaves no risk of an existing client incorrectly believing ..."
>
> 2: If any GET request results in a 303 response, the semantics of 303 already say that a second resource than the one identified by the effective request URI is involved (thus: 303, not 306 or 307).  If the client chooses to retrieve the 303 Location response header's resource, and that response has response headers Link type=ldp:Page and canonical=first request's effective request URI, then it can choose to retrieve the page sequence or not.
>
> Any client can do that, on any resource.  Within the working group, a common supposition has been that an http client library would do this transparently.  If you see any "external/pre-programmed notion of what the resource it gets back is going to be", please point it out.  It's conceivable that those involved are too close to it to see some subtlety, but having looked again we see no such requirement.  Indeed, we see *less* need for outside knowledge in this approach than in some alternatives suggested, for example 200 + Content-Location, which is why we obtained consensus on it.
>
>
>
> wrt scope of applicability
>
> Indeed, we separated Paging out in part to allow its application independently of LDP proper.  Along the way, the language was changed so that it applies to more than just RDF based resources.
> Are there any particular aspects of LDP that you believe your server would not comply with, or is the definitional normative requirement on being an LDP server coupled with the size of the LDP spec simply leading you to assume that you're not compliant?  The bare-minimum difference between a compliant LDP server and a conforming HTTP server is pretty small, IIRC - skimming it's 4.2.1.3 etags, 4.2.1.5 default base URI, done (assuming you don't intend to expose LDPRs or LDPCs, but we're talking about bare-min) .  LDP, for example, does not require you to host RDF at all or to deal with containers at all.  If your question stems in part from a "follow your nose - oh, a different big scary spec I have to grep through in order to use Paging at all, how 'nice'" reaction, that is something we could clarify in principle.
>
> As to other groups, as mentioned above we've engaged directly with the TAG and IETF HTTP on certain aspects, as well as co-membership with the RDF working group, and we've received comments on past LDP LC drafts (which did include Paging at first) announced the usual way over the span of a year from a variety of sources including Tim Berners-Lee.  If there are specific communities you have in mind to solicit that we might have omitted, this is a perfect time to get them reading and we'd appreciate your help in motivating them to comment within the review period.
>
>
> conneg
>
> I think that got covered above in the context of other comments; the TAG (and IETF's HTTP working group) have already seen and given comments on 2NN.  It was one thread off the TAG discussion that led to additional uses (outside of LDP) for 2NN, as documented in the IETF draft.
>
>
>
> [1] http://www.iana.org/assignments/link-relations/link-relations.xml
> [2] http://lists.w3.org/Archives/Public/www-tag/2013Dec/0041.html
> [3] http://lists.w3.org/Archives/Public/www-tag/2014Jan/0013.html
> [4] http://tools.ietf.org/html/rfc7231#section-3.1.4.2
> [5005] http://tools.ietf.org/html/rfc5005
>
> Best Regards, John
>
> Voice US 845-435-9470 BluePages <http://w3.ibm.com/jct03019wt/bluepages/simpleSearch.wss?searchBy=Internet+address&location=All+locations&searchFor=johnarwe>
> Cloud and Smarter Infrastructure OSLC Lead
>
Received on Wednesday, 17 September 2014 16:41:51 UTC