Re: [LDP Paging] Comparison to other techniques of pagination from John Arwe on 2014-09-17 (public-ldp-comments@w3.org from September 2014)

From: John Arwe <johnarwe@us.ibm.com>
Date: Wed, 17 Sep 2014 12:25:17 -0400
To: Austin William Wright <aaa@bzfx.net>
Cc: kjetil@kjernsmo.net, public-ldp-comments@w3.org
Message-ID: <OF23B7B6EA.00BB09B2-ON85257D56.0047D4D2-85257D56.005A3892@us.ibm.com>
Austin, the working group asked me to reply to your comment.  I'm the 
default RFC monger in this working group ;-)


Based on the volume of discussion we've had in the past, which includes 
within the working group in addition to liasing with other groups such as 
the W3C TAG and the IETF HTTP working group, such a comparison is highly 
unlikely to be small enough to non-disruptively fit in the 
introduction/etc of the document. 

If you have some alternatives to the reasoning below not covered here, 
please share as it's conceivably new information that would alter 
consensus opinions.



wrt next/prev, LDP Paging does make use of them, and in general draws 
substantial inspiration from [5005], specifically from section 3 Paged 
Feeds as the definitions (that non-normatively refer to 5005) should make 
clear.  Note that the 5005 link relation definitions are no longer the 
latest; the current normative definitions in the link relation registry 
[1] are compatible with LDP Paging's usage, although re-reading 5005 I see 
no conflicts.

wrt  *-archive link relations, they constrain the archive documents that 
their target URIs identify such that their state SHOULD NOT change over 
time, which is not a constraint that the working group believes is 
appropriate for LDP Paging in-sequence pages (6.2.9) ... keep in mind too 
the 2119 definition of Should Not, versus the developer attitude of 
"should == may".  More problematically, RFC 5005 section 4's constraints 
include (again, Should) specific content (fh:archive) "elements" in the 
resources' "head sections" ... in effect, binding archive documents to 
Atom Syndication Format and hence to XML; this in turn means that for 
resources that are RDF graphs (a central concern for LDP), there is no 
standard representation format (no standardized ASF serialization exists 
for RDF).

In both cases, reaching more deeply into 5005 and drawing a 1:1 
correspondence from (for example) feed entries to RDF triples would cause 
additional impedance mismatches.  If LDP Patch comes to fruition, then 
that might provide a good match (I'm speaking speculatively here and 
purely for myself - there have been zero working group discussions along 
these lines that I am aware of).  The idea of reconstructing a logical 
feed using a time-sequenced set of incremental patch entries seems like a 
natural application of 5005.  Agreement on an LDP Patch format has proven 
to be a stubbornly elusive goal over the lifetime of the working group, 
although it has recently made progress.




wrt adding new Range units, various working group members have looked at 
it several times over the life of the working group; personally, I did so 
as far back as Submission-drafting time.  The primary reasons that worked 
against re-use of Range were:

1: Servers are not free to initiate paging unilaterally using Range 
requests.  The ability for the server to initiate paging as a way to 
manage server load (and as a side effect, potential attacks) is a major 
concern of the working group members.

2: RDF based resources (a focus of LDP generally) are not seen to be 
amenable to range requests that require index-based access to triples, 
absent implementation or domain-specific assumptions about underlying 
ordering.  SQL-based back ends might be amenable to "counting triples", 
but other database technologies not so.  Then there is the issue of common 
RDF implementation components like Apache Jena, that faithfully implement 
the RDF graph definition of an unordered set ... therefore providing no 
interface-level guarantees of repeatable order in model traversal 
operations or serialization operations, even if the underlying graph were 
unchanged between requests.  Requiring all implementations to impose an 
index-based ordering on triples is seen as a significant implementation 
burden.

3: The inability for clients to have any guarantees about their view of a 
paged resource's state after a traversal in which the paged resource 
changes.  LDP Paging provides a stronger guarantee in 6.2.7 for paged 
resources in the latter case than Range or 5005 would guarantee for a 
archived feed once the equivalence to RDF is established (preceding 
point).  The (my) initial proposal started off with the "no guarantees, 
start over" position of 5005, and working group members advocated for the 
stronger guarantee.

4: Non-cacheability of responses.  Existing caches would be forced to 
treat extension units as uncacheable, if and until their implementations 
were updated to support the new LDP-defined units. 

FWIW, if a future spec were to standardize how clients request particular 
orderings from the server, e.g. sorting of a result set, then in those 
cases index-based triple access and new units (on Range and/or on LDP 
Paging's preference) might well be specified there as well.



wrt Content-Location and status code, this was an option that members of 
the working group did discuss with the W3C TAG [2],[3]and the IETF HTTP 
working group (their chair is cc'd on [3], as one example); short answer, 
there was no broad consensus on whether or not doing what you suggest is 
within HTTP, nor (if it is) that HTTP supplies an unambiguous and 
semantically correct interpretation.

1: [4] says that in the case you describe the C-L URI identifies a 
particular representation of the effective request URI.  The LDP 
established consensus that a single in-sequence page, in the general case, 
is not *the same resource* (in the sense of "state") as the paged 
resource.  We did not have consensus that the definition cited allows the 
server to respond to GET paged-resource-URI with 200 and C-L that 
identifies an in-sequence page (which, definitionally, has only a subset 
of the paged resource's state); my sense is that the working group mostly 
found that interpretation unnatural.  A client receiving a 200 response 
was believed to have every right to stop there (at that first GET), 
believing it has the *entire* state of the paged resource; this would not 
be true however when a paged resource is identified by the effective 
request URI and an in-sequence page resource is identified by the 
Content-Location response header (in the general case of the paged 
resource having > 1 page).

2: There is a competing mindset that says the server says what is, so 200 
+ C-L of a "subset" resource is perfectly fine: clients have to know 
something about the resource they're asking for.

LDP chose to specify an approach that leaves no risk of an existing client 
incorrectly believing that it has a complete representation of the state 
of the resource identified by the effective request URI when it does not, 
given existing implementations.  If consensus evolves in the wider 
community over time, then LDP Paging might be able to incorporate whatever 
optimizations become enabled, but the currently specified base should 
continue to work unchanged, even if it has to start with 303 to be safe 
wrt existing clients.  The at-risk text between 6.2.5 and 6.2.6 contains 
additional links as well.



wrt RFC 5989, LDP's scope was chartered to include HTTP and RDF.  I don't 
know that anyone in the working group was deeply aware of 5989 before your 
comment.  There was no appetite for adding a requirement on RLS or SIP for 
implementations.



wrt If clients have to be "paging aware", would that ...     There are 
several cases to consider, given the optional features involved.

1: If any GET request results in a 2NN response with response headers Link 
type=ldp:Page and canonical=effective request URI, then it can choose to 
retrieve the page sequence or not.  According to the 2NN draft [4], this 
would never happen with a compliant server unless the client sends an 
indication in the request that it supports 2NN responses, in keeping with 
"leaves no risk of an existing client incorrectly believing ..."

2: If any GET request results in a 303 response, the semantics of 303 
already say that a second resource than the one identified by the 
effective request URI is involved (thus: 303, not 306 or 307).  If the 
client chooses to retrieve the 303 Location response header's resource, 
and that response has response headers Link type=ldp:Page and 
canonical=first request's effective request URI, then it can choose to 
retrieve the page sequence or not. 

Any client can do that, on any resource.  Within the working group, a 
common supposition has been that an http client library would do this 
transparently.  If you see any "external/pre-programmed notion of what the 
resource it gets back is going to be", please point it out.  It's 
conceivable that those involved are too close to it to see some subtlety, 
but having looked again we see no such requirement.  Indeed, we see *less* 
need for outside knowledge in this approach than in some alternatives 
suggested, for example 200 + Content-Location, which is why we obtained 
consensus on it.



wrt scope of applicability

Indeed, we separated Paging out in part to allow its application 
independently of LDP proper.  Along the way, the language was changed so 
that it applies to more than just RDF based resources.
Are there any particular aspects of LDP that you believe your server would 
not comply with, or is the definitional normative requirement on being an 
LDP server coupled with the size of the LDP spec simply leading you to 
assume that you're not compliant?  The bare-minimum difference between a 
compliant LDP server and a conforming HTTP server is pretty small, IIRC - 
skimming it's 4.2.1.3 etags, 4.2.1.5 default base URI, done (assuming you 
don't intend to expose LDPRs or LDPCs, but we're talking about bare-min) . 
 LDP, for example, does not require you to host RDF at all or to deal with 
containers at all.  If your question stems in part from a "follow your 
nose - oh, a different big scary spec I have to grep through in order to 
use Paging at all, how 'nice'" reaction, that is something we could 
clarify in principle.

As to other groups, as mentioned above we've engaged directly with the TAG 
and IETF HTTP on certain aspects, as well as co-membership with the RDF 
working group, and we've received comments on past LDP LC drafts (which 
did include Paging at first) announced the usual way over the span of a 
year from a variety of sources including Tim Berners-Lee.  If there are 
specific communities you have in mind to solicit that we might have 
omitted, this is a perfect time to get them reading and we'd appreciate 
your help in motivating them to comment within the review period.


conneg

I think that got covered above in the context of other comments; the TAG 
(and IETF's HTTP working group) have already seen and given comments on 
2NN.  It was one thread off the TAG discussion that led to additional uses 
(outside of LDP) for 2NN, as documented in the IETF draft.



[1] http://www.iana.org/assignments/link-relations/link-relations.xml
[2] http://lists.w3.org/Archives/Public/www-tag/2013Dec/0041.html
[3] http://lists.w3.org/Archives/Public/www-tag/2014Jan/0013.html
[4] http://tools.ietf.org/html/rfc7231#section-3.1.4.2
[5005] http://tools.ietf.org/html/rfc5005


Best Regards, John

Voice US 845-435-9470  BluePages
Cloud and Smarter Infrastructure OSLC Lead
Received on Wednesday, 17 September 2014 16:26:43 UTC