Inconsistencies in Discovery methods from Eran Hammer-Lahav on 2009-02-07 (www-talk@w3.org from January to February 2009)

From: Eran Hammer-Lahav <eran@hueniverse.com>
Date: Fri, 6 Feb 2009 17:03:01 -0700
To: "www-talk@w3.org" <www-talk@w3.org>
CC: Mark Nottingham <mnot@mnot.net>, Jonathan Rees <jar@creativecommons.org>, "Roy T. Fielding" <fielding@gbiv.com>
Message-ID: <C5B20FB5.123C6%eran@hueniverse.com>
In HTTP-based Resource Descriptor Discovery [1], I am trying to define a
uniform way to attach metadata (descriptors) to resources. The idea is to
define three methods for obtaining the location (URI) of the descriptor
document via the resource (URI or representation). All three methods use the
'describedby' relation type.

1. <LINK> elements in HTML, XHTML, and Atom documents.
2. Link: headers in HTTP responses.
3. /site-meta documents [2], using a Link-Template (transforming the
resource URI to the descriptor URI using a URI template).

A descriptor contains information about a resource, but it is hard to define
this association in practical terms (that can translate directly to code).
Instead, the proposal defines the descriptor as 'information about a
resource identified by a URI'.

In the current draft I tried to use the HTTP status codes (obtained with the
first two methods, <LINK> and Link:), by instructing the client to follow
redirects and only use links from a small subset of status codes (200, 303,
401). This approach proved broken for 2 reasons:

1. It is up to the application to decide how redirects should be followed.
If a URI (when dereferenced and requested using an HTTP GET) returns a 307,
any links associated with that response may contain valid metadata that is
not the same as the metadata describing the URI the user-agent is being
redirected to (which in this example returns a 200).

2. It makes information obtained from <LINK> and Link: inconsistent with
that obtained from /site-meta. /site-meta has no way of follow redirects (it
is a static transformation template) and will always produce a URI
identifying the location of the descriptor associated with the 307 response,
not the follow-up 200.

To address that, I started taking a different approach with my upcoming
revision (-02) that basically tries to ignore HTTP status codes. It moves
the focus away from the 'resource' to the URI. But Roy's recent comment made
this approach (ignoring HTTP status codes) incomplete as well.

On 2/6/09 11:03 AM, "Roy T. Fielding" <fielding@gbiv.com> wrote:

> There are many resources involved in HTTP,
> only one of which is identified by the requested URI.  Each of those
> resources may have representations, and the meaning of the payload in a
> response message is defined by the status code.  A 404 response is going
> to contain a representation of a resource on the server that describes
> that error. A 200 response is going to contain a representation of the
> resource that was identified as the request target.

What this means is that a Link header in the HTTP response to a GET request
might not be about the resource identified by the URI used to make that
request.

For example, if:

GET /resource/1 HTTP/1.1
Host: example.com

returns:

HTTP/1.1 404 Not Found
Link: <http://example.com/about>; rel="describedby"

The Link is about the "resource on the server that describes that error",
and not about the resource identified by the URI
(http://example.com/resource/1).

Because /site-meta does not provide access to the HTTP status code, if it
returned http://example.com/about as the descriptor location of
http://example.com/resource/1, it would be incorrect (due to lack of
information about the 404 condition involved). In such a case, it is really
Link: header that is limited because the representation of the resource
isn't available (and therefore no place to put its links).

---

I am trying to find a way to keep the three methods in sync without further
limiting the usefulness of this protocol. So far the only approach I have is
to limit Link elements and headers (for use in this protocol) to HTTP
responses with a status code that can only be interpreted as about the
request URI.

>From a (very) quick review of the status codes, this means only the
following codes do not bind the response representation to the request URI:

* 1xx
* 202 - about the request's status, is this the same as the resource?
* 205 - does not seem to represent anything.
* 303 - not sure.
* 4xx, except maybe 406 - not sure, seems to be about the resource.
* 5xx

This seems to suggest most 2xx, most 3xx, and maybe 406, as the only valid
status codes to be allowed when looking for a 'describedby' link.

If this approach is acceptable, should the spec explicitly define which
status codes are valid? Or make do with a definition of 'HTTP responses with
a status code that is a representation of the request URI'. The second
option is generally preferred but at this point, even the spec author (me)
cannot fully determine how to implement it (as indicated by the 'not sure'
above).

Comments?

EHL

[1] http://tools.ietf.org/html/draft-hammer-discovery-01
[2] http://www.ietf.org/internet-drafts/draft-nottingham-site-meta-00.txt
Received on Saturday, 7 February 2009 00:03:50 UTC