Re: Headers vs Response Code for 2NN Contents Of Related from Roy T. Fielding on 2014-09-29 (ietf-http-wg@w3.org from July to September 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Mon, 29 Sep 2014 12:03:07 +0200
To: Sandro Hawke <sandro@w3.org>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <0E091D53-DB29-48FF-9554-62D276457E45@gbiv.com>
On Sep 27, 2014, at 1:46 PM, Sandro Hawke wrote:

> We seem to be stalled in discussion of the 2NN Contents Of Related proposal [1].  I'd like to briefly summarize the concerns I've heard and provide summary responses. Then I'd like to sketch an alternative proposal which might be more widely acceptable. It has the particular advantage of not needing IETF consensus to go forward, since it just requires new headers, not a new response code.
> 
> First, I think these are the main points in the discussion so far: 
> 
> Q. Why do you want a new response code?
> 
> We have a community of Linked Data users, passing around RDF over HTTP connections, and this is something they want. I'm specifically representing the W3C Linked Data Platform (LDP) WG which wants it for paging, to avoid a roundtrip. Others have said it would be useful for them, including the W3C TAG which says it helps with their httpRange-14 issue.

Generally speaking, when an application-specific group of developers say
that they need a change made to HTTP in order to support their needs,
it is almost always because they are making incorrect assumptions about
HTTP.  Avoiding such changes is what keeps HTTP (relatively) simple.

> Q. Why is one additional roundtrip so important?
> 
> If you're getting 1000 pages, then obviously adding one roundtrip is a trivial additional cost, but we expect the common behavior to be just looking at the first page. When you look at search results, do you go through every page? No, the response was, "There are a lot of results, here are some", and that's often enough, especially if they're ordered in some useful way. In this case of just looking at a small first page, saving the roundtrip could nearly double the speed.

Yes, which is why you get paged results on search queries.  This does not
require any changes to HTTP.

> Q. These interactions wont always make full use of caching.
> 
> True, not with the current cache architecture, they wont. We have some ideas for how to make caching work much better for Linked Data applications, but that can be handled separately.

But not as well as the existing solutions.

> Q. Why not use Range, with a new type of units?
> 
> That's not the way WebApps are usually written. There are a lot of details that would have to be figured out which are already figured out for next/prev paging, and that's what the developers seem to want. Plus, since the bar for new range units is IETF consensus, we're not optimistic that we'd be able to move forward even if we figured it all out.

As opposed to the suggested changes, which also require IETF consensus?

I would not use Range because that works in the opposite direction:
it is the client expressing a desired limitation on the server's results.
What you have described is a server deciding to limit the client's results.

> Q. Why not use 200 OK + Content Location?
> 
> 200+CL is defined as returning a representation of the requested resource, but in next/prev paging, each of those pages is conceptualized as its own resource.

That isn't a relevant concern.  The requested resource is paged.  So what?
You are confusing what you are requesting with the data stored on the back-end.
If you get a paged representation, then that is what you requested.
The origin server is responsible for determining what the URI means.
It is necessarily a different resource in the sense that it presents a
different view of the same data.  Why is that a problem for LD when it
isn't a problem for any other part of the Web?

The abstract interface is the essential difference between HTTP's
architecture of providing a representation of the data (instead of a
client-imposed assumption about the nature of the resource) and most
other systems reliance on shared data assumptions (like FTP's focus
on files and RPC's focus on procedures).

> Logically, something can't be the same resource as its own first page.

Logically, the client doesn't know what the resource is just because it
has a URI.  It doesn't know whether it will get a paged result or not.
The server is deciding that for the client, usually based on the amount
of data or the time required to produce results.  Regardless, the nature
of that representation is explained IN THE REPRESENTATION, either within
the data format or within the associated metadata.

> In practical terms, if a server used 200+CL to return just the first page, how would the client know it had only been sent the first few triples (instead of all of them, as usual)?

Because that's what the data says.

What tells the client that it has received only a page of the potential
result set is the metadata sent within (or alongside) that data.
That's what prev/next links are in HTML.  Formal relationships can be
defined for any data set.  These can be placed in the header fields,
in the body content (if the data format defines a place for them), or
even within a parameter on the media type.

The only real distinction between a paged 200 response and any other
resource is the presumption that there exists, somewhere, some other
resource that can represent the complete data set in one go.  However,
the only reason to provide a link to that other resource is to support
direct authoring (e.g., PUT).  In all other cases, access to the
"complete" resource tends to be limited for scalability reasons.

> The Content-Location header is already used is con-neg so it can't also carry this information.

No, it wouldn't, though it does provide a link to the source of *this*
representation.  Paging is just another form of content negotiation.

> Q. Why not use 200+Link?
> 
> It's true that in LDP paging there are Link headers that would tell the client it got the first page instead of the whole resource, but then we've still misused 200 OK and are sending a representation of a different resource. We'd have to carefully avoid caching, and it's unclear what other things might break by having links change the semantics.

Seriously confused.  You are making an assumption about the resource.
That assumption is false.  So, don't make that assumption.  Instead,
look at the data returned and it will tell you if it is a paged result.

Perhaps you might want to define some formal link relations, or add
metadata to some data formats, but your application does not require
any changes to HTTP.

....Roy
Received on Monday, 29 September 2014 10:03:31 UTC