- From: Jonathan Rees <jar@creativecommons.org>
- Date: Thu, 29 Jan 2009 09:56:17 -0500
- To: Eran Hammer-Lahav <eran@hueniverse.com>
- Cc: "www-tag@w3.org WG" <www-tag@w3.org>
On Jan 13, 2009, at 1:18 PM, Eran Hammer-Lahav wrote:
> http://tools.ietf.org/html/draft-hammer-discovery-00
>
> I have recently published a draft for obtaining resource descriptors
> via HTTP using Link headers, Link elements (HTML, Atom), and Site-
> Meta [1]. The draft goal is to provide a unified view on how to use
> the three methods for obtaining information about resources
> (discovery). The draft invents very little (an extension to Site-
> Meta allowing it to describe individual resources and not just the
> overall site).
>
> Any feedback would be greatly appreciated and can be sent directly
> to me or discussed on the www-talk@w3.org mailing list.
>
> Thanks,
>
> EHL
>
> [1] http://tools.ietf.org/html/draft-nottingham-site-meta-00
I'm sending my review (below) to you directly, cc: www-tag, and you
can feel free to forward it to www-talk or other lists.
-Jonathan
-----------
Here are some comments on [1], as requested:
- Please do not say 'resource discovery' as this protocol is not about
discovering resources. You have many alternatives that do not say
something that's confusing: 'descriptor resource discovery',
'description discovery', 'resource description discovery', etc.
- I really wish we could say something stronger about the format of
the DR. May I suggest that the DR be required to possess at least
one 'representation' that is either RDF/XML or convertible to
RDF/XML using GRDDL?
- I anticipate some confusion as to whether the link relates the
resource to the DR (as in the POWDER 'describedby' definition you
quote), the URI to the DR, or the URI to the DR's URI (as in the
second sentence of section 6). In RDF, <resource> describedby <dr>
is most natural to write, but RDF semantics rules out the
possibility that this might say anything specific to a particular
URI naming the resource[*]. This protocol is an opportunity for the
URI owner to say things not only about the resource but about the
URI/resource binding itself, such as its authority, provenance, and
stability, and that will vary with URI, not resource, as each URI
might have a different "owner".
This issue may be esoteric enough that addressing it might be more
confusing than not, but I want you to consider yourself forewarned.
- The POWDER documentation gives a different URI for the describedby
relation than the one that you'd get by using the proposed
IANA-based relation registry. It would be unfortunate if there
continued to be two URIs for the same thing, and you should work
with POWDER to settle on one or the other. I would not make use use
of the link relation registry a requirement.
- Editorial comment: On first reading I found the first set of bullets
in section 7 to be very mysterious. They make no sense at all until
you've read the following text. I suggest that (a) you list the
three methods before launching into the factors that go into
deciding between them; and (b) that the four bullets be more
specific - e.g. instead of saying it depends on document type (media
type), say that it depends on whether the resource has a
representation supporting the <link> element, and rather than saying
it depends on URI scheme, say that it depends on whether the scheme
is http(s) or something else.
- Bullet "HTTP Link header": "Limited to resources with an accessible
representation using the HTTP protocol [RFC2616], or..." -- while
you're not saying anything wrong here, I don't see what purpose the
part before the "or" serves, and I find it distracting. I think you
should simply say:
"Limited to resources for
which an HTTP GET or HEAD request returns a non-5xx
HTTP response [RFC2616]."
The exact limitation you want to put on HTTP (2xx, 2xx+3xx,
2xx+3xx+4xx, or any) is debatable. I think 3xx responses have to be
OK (see below), 4xx responses should be, and 5xx responses could be
although I don't think I would trust them.
If all HTTP responses can carry believable Link: headers, matters
are greatly simplified because you can just say that you can always
try the HTTP method - it is not limited in any way.
- In TAG discussion the question arose as to why all three methods had
to produce the same descriptor resource location. Another design
choice suggested was that you might get different information via
the three different channels; I said that I thought the intent was
that a consumer should be allowed to stop at any one of them without
missing out on any information, and that since the information was
all supposed to be "authorized" by the same source anyhow (the URI
owner), there would be no reason to put different information in
different places.
However, someone (Henry?) wondered why the requirement is so strict.
One way to relax it that seems harmless is to just say that the URIs
must all *name* the same DR. Even more relaxed (but more
complicated) would be to say that the resources named by the DR URIs
only have to carry the same information. (I guess you'd also have
to say that all three resources have the same portfolio of
representations, or some applications would miss out on the
representation they need if they used the wrong method.) How would
we run into trouble by just saying you have to get to the same DR?
- Anywhere you mention 301 and 302 you should also add 307.
- The algorithm in 8.2 is one I strongly object to, as it does not
permit
Link: on 30x responses, which IMO is a central Semantic Web use case.
Consider, for example, a "value added" URI for a document where a
301 response provides a Link: to useful metadata, and redirects to
the actual document.
Here's how it ought to work:
1. Obtain a response to a GET or HEAD request (perhaps from a cache
or proxy, etc.)
2. If the response is a 5xx, fail. (Having a Link: with many 40x
statuses might be not only perfectly meaningful, but extremely
useful - consider a DR for a 403, 405, or 410 that contains
adequate information to enable a user or application to proceed.)
3. If the response has an applicable Link: (per your steps 2-4), the
DR is the one named.
4. If the response has no applicable Link: but is a 301, 302, or
307, go back to step 1 using the URI given in the Location:
header. [Warning: You will find people who are strongly opposed
to this, and others strongly in favor, so rationale will be
needed.]
In other words, follow the redirect (301, 302, 307) and stop when
you get to the first Link: header. In a chain of 30x redirects
ending in a 200 or 303, each URI may have a different owner, and
each owner may have different things to say about the resource
and/or the nature of one or more of the URI/resource bindings.
Since the authority, if any, of the DR is connected with control
over the URI, not the resource, we should prefer a DR associated
with a 30x response to a DR associated with a 200 response for the
URI it redirects to.
The case could be made that the DR for the first URI is the *only*
one that should be recognized, as the fact of a redirect may only be
authorizing the target to provide representations of the resource,
not to provide descriptions of it. But this strict position would
be very inconvenient and might lead to (different) surprises.
You and I are both overlooking additional aspects of the HTTP
protocol, such as 100 and 305, but I don't see this as a big risk.
- Your proposal to specify URI-to-DR-URI rewrites as
template="prefix{uri}suffix" is a good start, but I think that the
additional ability to specify match conditions on the input URI will
end up being important. In one project I work on we're already
using the rule
http://host/path ==> http://host/about/path
to map the original URI to the DR URI. Similar rules would be
http://host/path ==> http://about.host/path
http://host/path.suffix ==> http://host/path.about
If all you can do is add a prefix and/or suffix to the URI, rules
like these aren't possible.
The most elaborate fix would be to permit rules like the ones in
Apache RedirectMatch directives, but I can see why you wouldn't go
there. The minimal mechanism required for the above would be to say
what prefix and/or suffix should be matched and stripped off. This
makes the rule language nicely symmetric:
<link-template
pattern="prefix1{infix}suffix1"
template="prefix2{infix}suffix2"
...>
If you do this you may not have to get into more complex rule
languages of the kind you suggest in your [[...]] remark; although
if rich enough the extensions you suggest might work perfectly well
to implement the rules I give above.
- We need to be careful about quoting. If a DR is meant to be found
via a CGI script invoked via a query URI (the link-template prefix
has a ? in it), and the original URI already contains significant
CGI characters like &, then an application could get into big
trouble. This needs to be either handled directly somehow (I can't
imagine how), or left as a combination of a big scary disclaimer and
a security warning.
- I think you need to warn that this protocol should only be applied
to URIs not containing a fragment id. If you allow fragment ids
you're going to get into serious problems with both quoting and
semantics.
Overall I think this is great progress and I like the way it's
structured. Personally I'm eager to see this deployed, as I think it
will serve the semantic web, and in particular the Science Commons
data project's agenda, very well.
-Jonathan
[1] http://tools.ietf.org/html/draft-hammer-discovery-00
[*] Footnote (not relevant unless you care about how RDF might
interact with this discovery protocol): Suppose U1 and U2 both name
(denote, identify, refer to, are interpreted to be, etc.) some
resource R, and suppose that
<U1> describedby <DR1>.
<U2> describedby <DR2>.
Then necessarily
<U1> describedby <DR2>.
<U2> describedby <DR1>.
To limit DR1 to the URI U1, you would have to use a different
relation, say describesBindingOf:
<DR1> describesBindingOf "U1"^^xsd:anyURI .
<DR2> describesBindingOf "U2"^^xsd:anyURI .
Received on Thursday, 29 January 2009 14:57:03 UTC