Re: [LDP Paging] Comparison to other techniques of pagination from Benjamin Armintor on 2014-09-17 (public-ldp-comments@w3.org from September 2014)

From: Benjamin Armintor <armintor@gmail.com>
Date: Wed, 17 Sep 2014 13:45:02 -0400
To: John Arwe <johnarwe@us.ibm.com>
Cc: Austin William Wright <aaa@bzfx.net>, kjetil@kjernsmo.net, public-ldp-comments@w3.org
Message-ID: <CADQQ8TOVr+sdbqP9Zuh=TF1gA3TgSo9k4ghS-D=6sM4VMJXfug@mail.gmail.com>
"But, I am also engaging in a relatively shallow effort to think about this
stuff."

For example, the working group probably knew that John was referring to
stability guarantees in section 6.2.7, and not my mis-reading as the
change-detection in 6.2.8. I'm sorry for overstating my list of
overstatements.

- Ben

On Wed, Sep 17, 2014 at 1:33 PM, Benjamin Armintor <armintor@gmail.com>
wrote:

> Thanks for the detailed reply, John; it's much appreciated.
>
> I think a couple of the points of the proposal vis-a-vis alternatives
> might overstate it's advantages:
> - etag and conditional GET are, as far as I know, available to range
> requests, so the same mechanism for change detection would be available to
> a hypothetical range-based approach
> - The 2NN status might wreak less havoc with caches than alternatives, but
> I'm not confident it will be a lot less
> - I understand that RDF stores needn't provide predictable order for
> triples, but don't personally follow how a such a resource's triples could
> then be paged in any useful way, which seems to bring us back to content
> negotiation.
>
> But, I am also engaging in a relatively shallow effort to think about this
> stuff. The ability of a server to unilaterally impose paging is one that
> Range requests is not a thing an approach built around Range could do
> without being pretty hostile to pre-HTTP 1.1 clients.
>
> - Ben
>
> On Wed, Sep 17, 2014 at 12:25 PM, John Arwe <johnarwe@us.ibm.com> wrote:
>
>> Austin, the working group asked me to reply to your comment.  I'm the
>> default RFC monger in this working group ;-)
>>
>>
>> Based on the volume of discussion we've had in the past, which includes
>> within the working group in addition to liasing with other groups such as
>> the W3C TAG and the IETF HTTP working group, such a comparison is highly
>> unlikely to be small enough to non-disruptively fit in the introduction/etc
>> of the document.
>>
>> If you have some alternatives to the reasoning below not covered here,
>> please share as it's conceivably new information that would alter consensus
>> opinions.
>>
>>
>>
>> wrt next/prev, LDP Paging does make use of them, and in general draws
>> substantial inspiration from [5005], specifically from section 3 Paged
>> Feeds as the definitions (that non-normatively refer to 5005) should make
>> clear.  Note that the 5005 link relation definitions are no longer the
>> latest; the current normative definitions in the link relation registry [1]
>> are compatible with LDP Paging's usage, although re-reading 5005 I see no
>> conflicts.
>>
>> wrt  *-archive link relations, they constrain the archive documents that
>> their target URIs identify such that their state SHOULD NOT change over
>> time, which is not a constraint that the working group believes is
>> appropriate for LDP Paging in-sequence pages (6.2.9) ... keep in mind too
>> the 2119 definition of Should Not, versus the developer attitude of "should
>> == may".  More problematically, RFC 5005 section 4's constraints include
>> (again, Should) specific content (fh:archive) "elements" in the resources'
>> "head sections" ... in effect, binding archive documents to Atom
>> Syndication Format and hence to XML; this in turn means that for resources
>> that are RDF graphs (a central concern for LDP), there is no standard
>> representation format (no standardized ASF serialization exists for RDF).
>>
>> In both cases, reaching more deeply into 5005 and drawing a 1:1
>> correspondence from (for example) feed entries to RDF triples would cause
>> additional impedance mismatches.  If LDP Patch comes to fruition, then that
>> might provide a good match (I'm speaking speculatively here and purely for
>> myself - there have been zero working group discussions along these lines
>> that I am aware of).  The idea of reconstructing a logical feed using a
>> time-sequenced set of incremental patch entries seems like a natural
>> application of 5005.  Agreement on an LDP Patch format has proven to be a
>> stubbornly elusive goal over the lifetime of the working group, although it
>> has recently made progress.
>>
>>
>>
>>
>> wrt adding new Range units, various working group members have looked at
>> it several times over the life of the working group; personally, I did so
>> as far back as Submission-drafting time.  The primary reasons that worked
>> against re-use of Range were:
>>
>> 1: Servers are not free to initiate paging unilaterally using Range
>> requests.  The ability for the server to initiate paging as a way to manage
>> server load (and as a side effect, potential attacks) is a major concern of
>> the working group members.
>>
>> 2: RDF based resources (a focus of LDP generally) are not seen to be
>> amenable to range requests that require index-based access to triples,
>> absent implementation or domain-specific assumptions about underlying
>> ordering.  SQL-based back ends might be amenable to "counting triples", but
>> other database technologies not so.  Then there is the issue of common RDF
>> implementation components like Apache Jena, that faithfully implement the
>> RDF graph definition of an unordered set ... therefore providing no
>> interface-level guarantees of repeatable order in model traversal
>> operations or serialization operations, even if the underlying graph were
>> unchanged between requests.  Requiring all implementations to impose an
>> index-based ordering on triples is seen as a significant implementation
>> burden.
>>
>> 3: The inability for clients to have any guarantees about their view of a
>> paged resource's state after a traversal in which the paged resource
>> changes.  LDP Paging provides a stronger guarantee in 6.2.7 for paged
>> resources in the latter case than Range or 5005 would guarantee for a
>> archived feed once the equivalence to RDF is established (preceding point).
>>  The (my) initial proposal started off with the "no guarantees, start over"
>> position of 5005, and working group members advocated for the stronger
>> guarantee.
>>
>> 4: Non-cacheability of responses.  Existing caches would be forced to
>> treat extension units as uncacheable, if and until their implementations
>> were updated to support the new LDP-defined units.
>>
>> FWIW, if a future spec were to standardize how clients request particular
>> orderings from the server, e.g. sorting of a result set, then in those
>> cases index-based triple access and new units (on Range and/or on LDP
>> Paging's preference) might well be specified there as well.
>>
>>
>>
>> wrt Content-Location and status code, this was an option that members of
>> the working group did discuss with the W3C TAG [2],[3]and the IETF HTTP
>> working group (their chair is cc'd on [3], as one example); short answer,
>> there was no broad consensus on whether or not doing what you suggest is
>> within HTTP, nor (if it is) that HTTP supplies an unambiguous and
>> semantically correct interpretation.
>>
>> 1: [4] says that in the case you describe the C-L URI identifies a
>> particular representation of the effective request URI.  The LDP
>> established consensus that a single in-sequence page, in the general case,
>> is not *the same resource* (in the sense of "state") as the paged resource.
>>  We did not have consensus that the definition cited allows the server to
>> respond to GET paged-resource-URI with 200 and C-L that identifies an
>> in-sequence page (which, definitionally, has only a subset of the paged
>> resource's state); my sense is that the working group mostly found that
>> interpretation unnatural.  A client receiving a 200 response was believed
>> to have every right to stop there (at that first GET), believing it has the
>> *entire* state of the paged resource; this would not be true however when a
>> paged resource is identified by the effective request URI and an
>> in-sequence page resource is identified by the Content-Location response
>> header (in the general case of the paged resource having > 1 page).
>>
>> 2: There is a competing mindset that says the server says what is, so 200
>> + C-L of a "subset" resource is perfectly fine: clients have to know
>> something about the resource they're asking for.
>>
>> LDP chose to specify an approach that leaves no risk of an existing
>> client incorrectly believing that it has a complete representation of the
>> state of the resource identified by the effective request URI when it does
>> not, given existing implementations.  If consensus evolves in the wider
>> community over time, then LDP Paging might be able to incorporate whatever
>> optimizations become enabled, but the currently specified base should
>> continue to work unchanged, even if it has to start with 303 to be safe wrt
>> existing clients.  The at-risk text between 6.2.5 and 6.2.6 contains
>> additional links as well.
>>
>>
>>
>> wrt RFC 5989, LDP's scope was chartered to include HTTP and RDF.  I don't
>> know that anyone in the working group was deeply aware of 5989 before your
>> comment.  There was no appetite for adding a requirement on RLS or SIP for
>> implementations.
>>
>>
>>
>> wrt If clients have to be "paging aware", would that ...     There are
>> several cases to consider, given the optional features involved.
>>
>> 1: If any GET request results in a 2NN response with response headers
>> Link type=ldp:Page and canonical=effective request URI, then it can choose
>> to retrieve the page sequence or not.  According to the 2NN draft [4], this
>> would never happen with a compliant server unless the client sends an
>> indication in the request that it supports 2NN responses, in keeping with
>> "leaves no risk of an existing client incorrectly believing ..."
>>
>> 2: If any GET request results in a 303 response, the semantics of 303
>> already say that a second resource than the one identified by the effective
>> request URI is involved (thus: 303, not 306 or 307).  If the client chooses
>> to retrieve the 303 Location response header's resource, and that response
>> has response headers Link type=ldp:Page and canonical=first request's
>> effective request URI, then it can choose to retrieve the page sequence or
>> not.
>>
>> Any client can do that, on any resource.  Within the working group, a
>> common supposition has been that an http client library would do this
>> transparently.  If you see any "external/pre-programmed notion of what the
>> resource it gets back is going to be", please point it out.  It's
>> conceivable that those involved are too close to it to see some subtlety,
>> but having looked again we see no such requirement.  Indeed, we see *less*
>> need for outside knowledge in this approach than in some alternatives
>> suggested, for example 200 + Content-Location, which is why we obtained
>> consensus on it.
>>
>>
>>
>> wrt scope of applicability
>>
>> Indeed, we separated Paging out in part to allow its application
>> independently of LDP proper.  Along the way, the language was changed so
>> that it applies to more than just RDF based resources.
>> Are there any particular aspects of LDP that you believe your server
>> would not comply with, or is the definitional normative requirement on
>> being an LDP server coupled with the size of the LDP spec simply leading
>> you to assume that you're not compliant?  The bare-minimum difference
>> between a compliant LDP server and a conforming HTTP server is pretty
>> small, IIRC - skimming it's 4.2.1.3 etags, 4.2.1.5 default base URI, done
>> (assuming you don't intend to expose LDPRs or LDPCs, but we're talking
>> about bare-min) .  LDP, for example, does not require you to host RDF at
>> all or to deal with containers at all.  If your question stems in part from
>> a "follow your nose - oh, a different big scary spec I have to grep through
>> in order to use Paging at all, how 'nice'" reaction, that is something we
>> could clarify in principle.
>>
>> As to other groups, as mentioned above we've engaged directly with the
>> TAG and IETF HTTP on certain aspects, as well as co-membership with the RDF
>> working group, and we've received comments on past LDP LC drafts (which did
>> include Paging at first) announced the usual way over the span of a year
>> from a variety of sources including Tim Berners-Lee.  If there are specific
>> communities you have in mind to solicit that we might have omitted, this is
>> a perfect time to get them reading and we'd appreciate your help in
>> motivating them to comment within the review period.
>>
>>
>> conneg
>>
>> I think that got covered above in the context of other comments; the TAG
>> (and IETF's HTTP working group) have already seen and given comments on
>> 2NN.  It was one thread off the TAG discussion that led to additional uses
>> (outside of LDP) for 2NN, as documented in the IETF draft.
>>
>>
>>
>> [1] http://www.iana.org/assignments/link-relations/link-relations.xml
>> [2] http://lists.w3.org/Archives/Public/www-tag/2013Dec/0041.html
>> [3] http://lists.w3.org/Archives/Public/www-tag/2014Jan/0013.html
>> [4] http://tools.ietf.org/html/rfc7231#section-3.1.4.2
>> [5005] http://tools.ietf.org/html/rfc5005
>>
>> Best Regards, John
>>
>> Voice US 845-435-9470  BluePages
>> <http://w3.ibm.com/jct03019wt/bluepages/simpleSearch.wss?searchBy=Internet+address&location=All+locations&searchFor=johnarwe>
>> Cloud and Smarter Infrastructure OSLC Lead
>>
>
>
Received on Wednesday, 17 September 2014 17:45:31 UTC