RE: Request for feedback: HTTP-based Resource Descriptor Discovery from Eran Hammer-Lahav on 2009-02-01 (www-talk@w3.org from January to February 2009)

From: Eran Hammer-Lahav <eran@hueniverse.com>
Date: Sun, 1 Feb 2009 01:53:59 -0700
To: Jonathan Rees <jar@creativecommons.org>
CC: "www-tag@w3.org" <www-tag@w3.org>, Phil Archer <phil@philarcher.org>, Mark Nottingham <mnot@mnot.net>, "www-talk@w3.org" <www-talk@w3.org>
Message-ID: <90C41DD21FB7C64BB94121FBBC2E7234127C9399D9@P3PW5EX1MB01.EX1.SECURESERVER.NET>
What we want is a resource descriptor, not URI descriptor. It is clear that a URI descriptor discovery must not allow any secondary requests. Whatever you find after a single GET/HEAD of the dereferenced URI is what you are going to use.

The answer seems to be that the descriptor location is obtained from whatever the client considers a valid representation of the resource. From recent discussions, there seems to be consensus that the Link header is between two resources (not representations). Link headers (due to the nature of HTTP) are attached to a representation, but their subject is the resource itself. <LINK> elements have similar semantics.

Therefore, the discovery spec, instead of providing a single workflow (i.e. follow redirects, look for 200 or 303, etc.) needs to pass the decision of which Link headers to use to the client. This can be even more complex if a 301 header includes Links and the 200 header (followed from the 301) does not, but offers an HTML representation with <LINK> elements.

If you consider your example below, which Link header to use (the one attached to the 301 response or the 200 obtained by following the 301 redirect), the answer is the Link header attached to the representation of the resource the client is interested in. It is perfectly valid for different representations to include different Links (as long as the Links are not representation specific, just more applicable).

For example, descriptor discovery of web pages intended for consumption using a browser will usually ignore Link headers on the 301 and fetch those on the 200. Why? Because that is the resource they are actually interested in. The always follow redirects blindly, and the intermediate URIs are ignored and hidden from the end user.

In other words. If you have a URI U which redirects you to URI V, the decision which RD to use (DR U or RD V) is completely tied to which representation is more relevant to your inquiry.

This was somewhat hidden in the spec with regard to <LINK> element because it ignores how the client got from the resource URI to the HTML document. But it suffers from the same ambiguity.

The problem, of course, is find a way to define it in an interoperable way.

EHL



> -----Original Message-----
> From: Jonathan Rees [mailto:jar@creativecommons.org]
> Sent: Saturday, January 31, 2009 8:55 PM
> To: Eran Hammer-Lahav
> Cc: www-tag@w3.org; Phil Archer; Mark Nottingham; www-talk@w3.org
> Subject: Re: Request for feedback: HTTP-based Resource Descriptor
> Discovery
>
> Let's work out this redirection case, since nothing else matters if we
> can't agree on this. I'll get back to your other questions later.
>
> The problem with your treatment of redirects is that the protocol can
> give the wrong answer.
>
> The situation is that we do a GET/HEAD of a URI U, and receive a
> 301/302/307 specifying Location: V. Your protocol is supposed to get a
> description resource for the resource "identified" (RFC 3986) by U,
> yet you will throw away a DR in the response to GET/HEAD U (one that
> is explicitly said to be a DR of U) and look for one in the response
> to GET/HEAD V instead. What makes you think that V names the same
> resource as U? If it doesn't, V's DR has no bearing on the resource
> named by U. Even if you assume they do name the same resource (which
> you can't in the 307 case), why would you have any reason to prefer
> the V DR to the U DR? The ability to serve a resource's
> representations does not necessarily make you better qualified than
> anyone else to describe it.
>
> You may want to say: Well, the U and V resources have the same
> representations (GET behavior), so doesn't that mean they're the same
> resource?  I don't think it follows. In particular there are other
> methods to consider, such as POST. As far as I know all GETs can be
> the same and the resources can still be different.
>
> The only theory I know of for deciding which resource is supposed to
> be named by a URI is that articulated in the W3C web architecture
> recommendation [1]. This says that it is up to a party known as the
> URI's "owner" to bind the URI to some resource. So if you want to
> learn about a named resource, it is up to the URI owner to determine
> what resource it is you want to learn about. Why should you talk to
> anyone else, if the owner is willing to speak (via Link:)?
>
> I think it is practical and reasonable that *if* U's owner provides no
> DR, then we can risk taking a 301 redirect (and maybe 302) to mean
> that V names the same resource, so that V's DR, if any, describes that
> resource. But an explicit Link: on a redirect has to mean that the URI
> owner, who is an "authority", is trying to say something important to
> you about the resource, such as the ways in which it differs from the
> redirect target.
>
> Even if U and V are assumed to name the same resource, or resources
> that cannot be distinguished, it is very easy to come up with cases
> where either DR is vastly preferable to the other; differences in
> credibility due to deception, reliability, competence, and timeliness
> can go either way. If you ask a librarian, they will say that the
> original publisher (V) is rarely to be trusted to provide good
> metadata, and one should consult a competent metadata service to
> obtain such (U). (This is a real use case.)
>
> There is a practical reason to prefer the U DR: it can be obtained in
> one roundtrip, while getting the V DR takes two.
>
> I also wonder how the redirect case is any different from that of a
> proxy server that adds a Link: header. If you could detect that the
> proxy server added it, and not the origin (you can't), would you throw
> away the proxy server specified DR, even when the origin provided none?
>
> -Jonathan
>
> [1] http://www.w3.org/TR/webarch/#uri-assignment
Received on Sunday, 1 February 2009 08:54:49 UTC