- From: Jonathan Rees <jar@creativecommons.org>
- Date: Thu, 29 Jan 2009 09:56:17 -0500
- To: Eran Hammer-Lahav <eran@hueniverse.com>
- Cc: "www-tag@w3.org WG" <www-tag@w3.org>
On Jan 13, 2009, at 1:18 PM, Eran Hammer-Lahav wrote: > http://tools.ietf.org/html/draft-hammer-discovery-00 > > I have recently published a draft for obtaining resource descriptors > via HTTP using Link headers, Link elements (HTML, Atom), and Site- > Meta [1]. The draft goal is to provide a unified view on how to use > the three methods for obtaining information about resources > (discovery). The draft invents very little (an extension to Site- > Meta allowing it to describe individual resources and not just the > overall site). > > Any feedback would be greatly appreciated and can be sent directly > to me or discussed on the www-talk@w3.org mailing list. > > Thanks, > > EHL > > [1] http://tools.ietf.org/html/draft-nottingham-site-meta-00 I'm sending my review (below) to you directly, cc: www-tag, and you can feel free to forward it to www-talk or other lists. -Jonathan ----------- Here are some comments on [1], as requested: - Please do not say 'resource discovery' as this protocol is not about discovering resources. You have many alternatives that do not say something that's confusing: 'descriptor resource discovery', 'description discovery', 'resource description discovery', etc. - I really wish we could say something stronger about the format of the DR. May I suggest that the DR be required to possess at least one 'representation' that is either RDF/XML or convertible to RDF/XML using GRDDL? - I anticipate some confusion as to whether the link relates the resource to the DR (as in the POWDER 'describedby' definition you quote), the URI to the DR, or the URI to the DR's URI (as in the second sentence of section 6). In RDF, <resource> describedby <dr> is most natural to write, but RDF semantics rules out the possibility that this might say anything specific to a particular URI naming the resource[*]. This protocol is an opportunity for the URI owner to say things not only about the resource but about the URI/resource binding itself, such as its authority, provenance, and stability, and that will vary with URI, not resource, as each URI might have a different "owner". This issue may be esoteric enough that addressing it might be more confusing than not, but I want you to consider yourself forewarned. - The POWDER documentation gives a different URI for the describedby relation than the one that you'd get by using the proposed IANA-based relation registry. It would be unfortunate if there continued to be two URIs for the same thing, and you should work with POWDER to settle on one or the other. I would not make use use of the link relation registry a requirement. - Editorial comment: On first reading I found the first set of bullets in section 7 to be very mysterious. They make no sense at all until you've read the following text. I suggest that (a) you list the three methods before launching into the factors that go into deciding between them; and (b) that the four bullets be more specific - e.g. instead of saying it depends on document type (media type), say that it depends on whether the resource has a representation supporting the <link> element, and rather than saying it depends on URI scheme, say that it depends on whether the scheme is http(s) or something else. - Bullet "HTTP Link header": "Limited to resources with an accessible representation using the HTTP protocol [RFC2616], or..." -- while you're not saying anything wrong here, I don't see what purpose the part before the "or" serves, and I find it distracting. I think you should simply say: "Limited to resources for which an HTTP GET or HEAD request returns a non-5xx HTTP response [RFC2616]." The exact limitation you want to put on HTTP (2xx, 2xx+3xx, 2xx+3xx+4xx, or any) is debatable. I think 3xx responses have to be OK (see below), 4xx responses should be, and 5xx responses could be although I don't think I would trust them. If all HTTP responses can carry believable Link: headers, matters are greatly simplified because you can just say that you can always try the HTTP method - it is not limited in any way. - In TAG discussion the question arose as to why all three methods had to produce the same descriptor resource location. Another design choice suggested was that you might get different information via the three different channels; I said that I thought the intent was that a consumer should be allowed to stop at any one of them without missing out on any information, and that since the information was all supposed to be "authorized" by the same source anyhow (the URI owner), there would be no reason to put different information in different places. However, someone (Henry?) wondered why the requirement is so strict. One way to relax it that seems harmless is to just say that the URIs must all *name* the same DR. Even more relaxed (but more complicated) would be to say that the resources named by the DR URIs only have to carry the same information. (I guess you'd also have to say that all three resources have the same portfolio of representations, or some applications would miss out on the representation they need if they used the wrong method.) How would we run into trouble by just saying you have to get to the same DR? - Anywhere you mention 301 and 302 you should also add 307. - The algorithm in 8.2 is one I strongly object to, as it does not permit Link: on 30x responses, which IMO is a central Semantic Web use case. Consider, for example, a "value added" URI for a document where a 301 response provides a Link: to useful metadata, and redirects to the actual document. Here's how it ought to work: 1. Obtain a response to a GET or HEAD request (perhaps from a cache or proxy, etc.) 2. If the response is a 5xx, fail. (Having a Link: with many 40x statuses might be not only perfectly meaningful, but extremely useful - consider a DR for a 403, 405, or 410 that contains adequate information to enable a user or application to proceed.) 3. If the response has an applicable Link: (per your steps 2-4), the DR is the one named. 4. If the response has no applicable Link: but is a 301, 302, or 307, go back to step 1 using the URI given in the Location: header. [Warning: You will find people who are strongly opposed to this, and others strongly in favor, so rationale will be needed.] In other words, follow the redirect (301, 302, 307) and stop when you get to the first Link: header. In a chain of 30x redirects ending in a 200 or 303, each URI may have a different owner, and each owner may have different things to say about the resource and/or the nature of one or more of the URI/resource bindings. Since the authority, if any, of the DR is connected with control over the URI, not the resource, we should prefer a DR associated with a 30x response to a DR associated with a 200 response for the URI it redirects to. The case could be made that the DR for the first URI is the *only* one that should be recognized, as the fact of a redirect may only be authorizing the target to provide representations of the resource, not to provide descriptions of it. But this strict position would be very inconvenient and might lead to (different) surprises. You and I are both overlooking additional aspects of the HTTP protocol, such as 100 and 305, but I don't see this as a big risk. - Your proposal to specify URI-to-DR-URI rewrites as template="prefix{uri}suffix" is a good start, but I think that the additional ability to specify match conditions on the input URI will end up being important. In one project I work on we're already using the rule http://host/path ==> http://host/about/path to map the original URI to the DR URI. Similar rules would be http://host/path ==> http://about.host/path http://host/path.suffix ==> http://host/path.about If all you can do is add a prefix and/or suffix to the URI, rules like these aren't possible. The most elaborate fix would be to permit rules like the ones in Apache RedirectMatch directives, but I can see why you wouldn't go there. The minimal mechanism required for the above would be to say what prefix and/or suffix should be matched and stripped off. This makes the rule language nicely symmetric: <link-template pattern="prefix1{infix}suffix1" template="prefix2{infix}suffix2" ...> If you do this you may not have to get into more complex rule languages of the kind you suggest in your [[...]] remark; although if rich enough the extensions you suggest might work perfectly well to implement the rules I give above. - We need to be careful about quoting. If a DR is meant to be found via a CGI script invoked via a query URI (the link-template prefix has a ? in it), and the original URI already contains significant CGI characters like &, then an application could get into big trouble. This needs to be either handled directly somehow (I can't imagine how), or left as a combination of a big scary disclaimer and a security warning. - I think you need to warn that this protocol should only be applied to URIs not containing a fragment id. If you allow fragment ids you're going to get into serious problems with both quoting and semantics. Overall I think this is great progress and I like the way it's structured. Personally I'm eager to see this deployed, as I think it will serve the semantic web, and in particular the Science Commons data project's agenda, very well. -Jonathan [1] http://tools.ietf.org/html/draft-hammer-discovery-00 [*] Footnote (not relevant unless you care about how RDF might interact with this discovery protocol): Suppose U1 and U2 both name (denote, identify, refer to, are interpreted to be, etc.) some resource R, and suppose that <U1> describedby <DR1>. <U2> describedby <DR2>. Then necessarily <U1> describedby <DR2>. <U2> describedby <DR1>. To limit DR1 to the URI U1, you would have to use a different relation, say describesBindingOf: <DR1> describesBindingOf "U1"^^xsd:anyURI . <DR2> describesBindingOf "U2"^^xsd:anyURI .
Received on Thursday, 29 January 2009 14:57:03 UTC