Re: [LDP Paging] Comparison to other techniques of pagination from John Arwe on 2014-10-10 (public-ldp-comments@w3.org from October 2014)

From: John Arwe <johnarwe@us.ibm.com>
Date: Fri, 10 Oct 2014 11:08:19 -0400
To: Austin William Wright <aaa@bzfx.net>
Cc: Kjetil Kjernsmo <kjetil@kjernsmo.net>, public-ldp-comments@w3.org
Message-ID: <OFC9DABEB8.9C8ED2D4-ON85257D6D.004B3348-85257D6D.00532969@us.ibm.com>
Digging out after vacation...

> integrate LDP. Or maybe more specifically, what functionality this 
> gives implementers, over RFC5005 by itself.

I think the big hitter is completeness of the logical feed [1].  RFC 5005 
provides no way to assess completeness, hence the SHOULD NOT in paragraph 
3.  LDP adds the etag= Link parameter to allow clients to know whether or 
not the paged resource (logical feed, in 5005 terms) changed during 
traversal.  The rest could be viewed (with a skeptical eye) as 
optimizations like 2NN; those matter most if the #pages per paged resource 
is small (2, especially).
See clauses 6.2.7, 6.2.8.

LDP also allows a client given the URL of a page to definitively know that 
is is a page (subset of some larger resource), via 6.2.16.  5005 provides 
no such mechanism, since its logical feed is not required to be 
addressable.  6573 could be used to similar effect, but the meanings of 
"item" and "collection" there are pretty loose; to use an LDP analogy, in 
principle those link relations could be used for either membership or 
containership triples (or both).

> My understanding is that we can already use RFC5005 pagination for 
> exposing documents in a series; but we might want to utilize LDP 
> Paging if we still want to refer to a resource -- all the pages 
> together -- as a /single/ resource or graph, to which we could refer
> to or perform onperations on (like PATCH, as mentioned). Does this 
> sound correct?

Yes; LDP provides you the standard way for a client to use the paged 
resource and the pages together in sensible ways.  5005 only deals with 
pages, except in the degenerate case (from a paging point of view) where 
the entire feed document is a single page.

> Is LDP Paging able to unilaterally initiate paging? The document 
> says that was avoided because it could break existing clients that 
> know a resource to already be a cohesive, unpaged representation:

I think you are referring to 4.2 here, which is an example. 
6.2.6 (SHOULD NOT) is the corresponding normative clause, so technically 
LDP Paging can be unilaterally initiated by the server although the WG 
believes the best practice is to let the client control this.  The 303 
response would never break a *correctly coded* HTTP client, but as 4.2 
notes many existing HTTP clients treat 303 (see *other*) as if it were a 
301/302 (permanent/temporary redirect for *the same* resource), so LDP 
Paging errs on the side of caution.

If you're talking about a migration case like 4.2 is (existing clients 
exist using the URI a certain way), treating 303 like 301/302 can lead to 
subtly erroneous behavior.
If you're talking about a new URI and a system yet to be tested, it's more 
likely to be acceptable (because you're going to test things more closely) 
to unilaterally initiate 303 paging.

> Or would a new status code be able to initiate paging without any 
> Prefer header?

The issues with 2NN are somewhat different; since HTTP requires unknown 
2xx status codes to be treated as 200, initiating 2NN-paging in the 
presence of existing clients (key point: even correctly coded ones) is 
going to lead to erroneous behavior as soon as >1 page exists.  To use a 
popular analogy: the first 10 search results are not ALL the search 
results.

> Is LDP Paging able to unilaterally initiate paging? The document 
> says that was avoided because it could break existing clients that 
> know a resource to already be a cohesive, unpaged representation:
> 
> <blockquote> The new protocol does not solve the problem of 
> migrating existing clients from the old "all" to the new "first 
> subset" </blockquote>

In the context of 4.2, which is an example, the "new protocol" phrase was 
poorly chosen.  s/protocol/new "first subset" URI approach/*
I see how you could have read "new protocol" to mean LDP Paging, but that 
was not my intent.  I will make that change in the editor's draft.

> Also, in my understanding, an ETag varies over representations. Is 
> it intentional that the ETag header doesn't change over each page, 
> in the listed examples? Could it change, especially if that were 
> easier to implement in my service? (Though ETag is per-resource, and
> each page gets its own URI, so this is likely a non-issue in general.)

If you mean: examples 6/8 and 10/12 have the same Etag: values, look 
carefully ... they don't.  I perturbed the high order bits so each page's 
etag value was distinct.  So etag.6 !== etag.8 and so on was intentional, 
for expository purposes.  The fact that etag.6 == etag.10 and so on is an 
artifact of my copying things, although barring changes to the paged 
resource I would also expect that to hold in practice.  The 
etag="customer-relations-v1" value was also intentionally held constant 
throughout, to illustrate the normal case where a client traverses all 
pages without the paged resource changing ... illustrating that the client 
now has a "complete" (5005 sense) view of the paged resource.  I also used 
a consistent representation format throughout (Turtle).

If you mean: if I (you) had an RDF/XML (or other) representation of the 
same resources used in those examples, could their etag values differ from 
the Turtle's etag values shown?  Sure.  LDP Paging does not alter what 
HTTP allows as etag values.

> So, how is an LDP Paging response different than Content-Type 
> Negotiation, such that 2xx (Contents of Related) becomes necessary?

[server side knowledge]
The resource identified by the URI R is a set of names suitable for a 
given purpose P.
In this example, there happen to be 100 names in the list.

[client] GET R

Vanilla HTTP conneg allows the server to do pretty much whatever it wants 
and still be compliant.  However, if the server published documentation 
saying that R identifies a list of names, most clients will expect a 200 
response to contain 100 names in addition potentially to any other 
information (the current count, modified time, etc) it receives.  In other 
words, there's also a social contract that matters.

[server] 200 + representation of a list of 50 names

Technically compliant with HTTP according to some interpretations; clearly 
not compliant with the social contract though, and likely accompanied by 
social consequences (help desk calls, etc). 

Even if there is a Link rel='next' response header to the next 50, only 
5005-aware clients are (maybe) going to do anything with that information. 
 If you know all your application's clients will handle it correctly, 
might be good enough *for you, now*.  Once you don't have control over the 
clients, back to social consequences.

[server] 303 to R' == representation of a list of 50 names

Correctly coded HTTP clients will seek user input about whether the other 
resource (R') is an adequate substitute for R.  The rest matches the 
200+50 case above fundamentally.




If the client's GET R request contains the assertions that signal it is a 
compliant LDP Paging client, then you know it handles the 303 case 
correctly (seeks user input), at a minimum.


> Is the difference that we already have an information resource at <
> http://example.org/customer-relations> and we want to serve a 
> _different_ information resource? This seems to be TBL's original 
> example for pagination, though I don't see any reason this is much 
> different than Content Negotiation.

It is, and the specific example above might help.  To make it even more 
obvious, just tweak the level of social consequences:  make P one of the 
following...
- the set of government agencies you're required to send a compliance 
report to periodically.  if they fail to get it, you pay fines or can't 
open for business (brokerage settlements do this).
- the set of Kickstarter supporters for your project, who have pre-ordered 
your new product.  if they fail to get it, you get in all kinds of legal 
and social hot water.

> Particularly I was wondering if this could be applied to more 
> generic, not-necessarily-LDP uses like a photo gallery, product 
> catalog, or list of blog posts (though such usages would be prime 
> candidates for LDP). ...

We ultimately made LDP Paging a separate spec from LDP in part to 
specifically facilitate that kind of wider re-use, so: hearty "yes".

> ... It appears this might be covered by `max-
> member-count`, but I'm not sure because "member" is not well 
> defined. Can a "member" be... anything? (I would expect a term like 
> "item" to be used for this purpose, as in "list item".) This is 
> probably the only thing that really needs clarification.

As currently described, member count depends on LDP - although this is 
overly subtle I agree.  Where it says "This parameter is only meaningful 
for paged containers." we should probably s/containers/LDP containers/ 
and/or hyperlink the "containers" definition.  I'll do that.
We did not widely re-examine use cases when we split Paging out of LDP, 
but given people's stated intent to enable wider re-use the WG might be 
amenable to adding such a unit ... more so if you commit to supply an 
implementation report for it.  Otherwise you are free to add your own 
extension unit now and propose it formally in the future, your extension 
simply won't have a standardized definition (and therefore 
interoperability would be limited).  I can't find any place that we 
explicitly say that, although I'm usually careful about those things, so 
I'll make sure it's rendered explicit that extensions unit values are 
permitted.
Do you want the WG to consider adding a new unit value with the semantic 
you're after (and if so can you commit to implementing it), or treat it as 
an extension?


[1] http://tools.ietf.org/html/rfc5005#section-3

Best Regards, John

Voice US 845-435-9470  BluePages
Cloud and Smarter Infrastructure OSLC Lead
Received on Friday, 10 October 2014 15:15:14 UTC