Re: Request for feedback: HTTP-based Resource Descriptor Discovery from Jonathan Rees on 2009-01-29 (www-tag@w3.org from January 2009)

From: Jonathan Rees <jar@creativecommons.org>
Date: Thu, 29 Jan 2009 09:56:17 -0500
To: Eran Hammer-Lahav <eran@hueniverse.com>
Cc: "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <3832AC1F-7855-41CD-BEDF-311928C39E46@creativecommons.org>
On Jan 13, 2009, at 1:18 PM, Eran Hammer-Lahav wrote:

> http://tools.ietf.org/html/draft-hammer-discovery-00
>
> I have recently published a draft for obtaining resource descriptors  
> via HTTP using Link headers, Link elements (HTML, Atom), and Site- 
> Meta [1]. The draft goal is to provide a unified view on how to use  
> the three methods for obtaining information about resources  
> (discovery). The draft invents very little (an extension to Site- 
> Meta allowing it to describe individual resources and not just the  
> overall site).
>
> Any feedback would be greatly appreciated and can be sent directly  
> to me or discussed on the www-talk@w3.org mailing list.
>
> Thanks,
>
> EHL
>
> [1] http://tools.ietf.org/html/draft-nottingham-site-meta-00

I'm sending my review (below) to you directly, cc: www-tag, and you  
can feel free to forward it to www-talk or other lists.
-Jonathan

-----------

Here are some comments on [1], as requested:

- Please do not say 'resource discovery' as this protocol is not about
   discovering resources.  You have many alternatives that do not say
   something that's confusing: 'descriptor resource discovery',
   'description discovery', 'resource description discovery', etc.

- I really wish we could say something stronger about the format of
   the DR.  May I suggest that the DR be required to possess at least
   one 'representation' that is either RDF/XML or convertible to
   RDF/XML using GRDDL?

- I anticipate some confusion as to whether the link relates the
   resource to the DR (as in the POWDER 'describedby' definition you
   quote), the URI to the DR, or the URI to the DR's URI (as in the
   second sentence of section 6).  In RDF, <resource> describedby <dr>
   is most natural to write, but RDF semantics rules out the
   possibility that this might say anything specific to a particular
   URI naming the resource[*].  This protocol is an opportunity for the
   URI owner to say things not only about the resource but about the
   URI/resource binding itself, such as its authority, provenance, and
   stability, and that will vary with URI, not resource, as each URI
   might have a different "owner".

   This issue may be esoteric enough that addressing it might be more
   confusing than not, but I want you to consider yourself forewarned.

- The POWDER documentation gives a different URI for the describedby
   relation than the one that you'd get by using the proposed
   IANA-based relation registry.  It would be unfortunate if there
   continued to be two URIs for the same thing, and you should work
   with POWDER to settle on one or the other.  I would not make use use
   of the link relation registry a requirement.

- Editorial comment: On first reading I found the first set of bullets
   in section 7 to be very mysterious.  They make no sense at all until
   you've read the following text.  I suggest that (a) you list the
   three methods before launching into the factors that go into
   deciding between them; and (b) that the four bullets be more
   specific - e.g. instead of saying it depends on document type (media
   type), say that it depends on whether the resource has a
   representation supporting the <link> element, and rather than saying
   it depends on URI scheme, say that it depends on whether the scheme
   is http(s) or something else.

- Bullet "HTTP Link header": "Limited to resources with an accessible
   representation using the HTTP protocol [RFC2616], or..." -- while
   you're not saying anything wrong here, I don't see what purpose the
   part before the "or" serves, and I find it distracting.  I think you
   should simply say:
       "Limited to resources for
       which an HTTP GET or HEAD request returns a non-5xx
       HTTP response [RFC2616]."
   The exact limitation you want to put on HTTP (2xx, 2xx+3xx,
   2xx+3xx+4xx, or any) is debatable.  I think 3xx responses have to be
   OK (see below), 4xx responses should be, and 5xx responses could be
   although I don't think I would trust them.

   If all HTTP responses can carry believable Link: headers, matters
   are greatly simplified because you can just say that you can always
   try the HTTP method - it is not limited in any way.

- In TAG discussion the question arose as to why all three methods had
   to produce the same descriptor resource location.  Another design
   choice suggested was that you might get different information via
   the three different channels; I said that I thought the intent was
   that a consumer should be allowed to stop at any one of them without
   missing out on any information, and that since the information was
   all supposed to be "authorized" by the same source anyhow (the URI
   owner), there would be no reason to put different information in
   different places.

   However, someone (Henry?) wondered why the requirement is so strict.
   One way to relax it that seems harmless is to just say that the URIs
   must all *name* the same DR.  Even more relaxed (but more
   complicated) would be to say that the resources named by the DR URIs
   only have to carry the same information.  (I guess you'd also have
   to say that all three resources have the same portfolio of
   representations, or some applications would miss out on the
   representation they need if they used the wrong method.)  How would
   we run into trouble by just saying you have to get to the same DR?

- Anywhere you mention 301 and 302 you should also add 307.

- The algorithm in 8.2 is one I strongly object to, as it does not  
permit
   Link: on 30x responses, which IMO is a central Semantic Web use case.
   Consider, for example, a "value added" URI for a document where a
   301 response provides a Link: to useful metadata, and redirects to
   the actual document.

   Here's how it ought to work:

   1. Obtain a response to a GET or HEAD request (perhaps from a cache
      or proxy, etc.)

   2. If the response is a 5xx, fail.  (Having a Link: with many 40x
      statuses might be not only perfectly meaningful, but extremely
      useful - consider a DR for a 403, 405, or 410 that contains
      adequate information to enable a user or application to proceed.)

   3. If the response has an applicable Link: (per your steps 2-4), the
      DR is the one named.

   4. If the response has no applicable Link: but is a 301, 302, or
      307, go back to step 1 using the URI given in the Location:
      header.  [Warning: You will find people who are strongly opposed
      to this, and others strongly in favor, so rationale will be
      needed.]

   In other words, follow the redirect (301, 302, 307) and stop when
   you get to the first Link: header.  In a chain of 30x redirects
   ending in a 200 or 303, each URI may have a different owner, and
   each owner may have different things to say about the resource
   and/or the nature of one or more of the URI/resource bindings.
   Since the authority, if any, of the DR is connected with control
   over the URI, not the resource, we should prefer a DR associated
   with a 30x response to a DR associated with a 200 response for the
   URI it redirects to.

   The case could be made that the DR for the first URI is the *only*
   one that should be recognized, as the fact of a redirect may only be
   authorizing the target to provide representations of the resource,
   not to provide descriptions of it.  But this strict position would
   be very inconvenient and might lead to (different) surprises.

   You and I are both overlooking additional aspects of the HTTP
   protocol, such as 100 and 305, but I don't see this as a big risk.

- Your proposal to specify URI-to-DR-URI rewrites as
   template="prefix{uri}suffix" is a good start, but I think that the
   additional ability to specify match conditions on the input URI will
   end up being important.  In one project I work on we're already
   using the rule

     http://host/path  ==>  http://host/about/path

   to map the original URI to the DR URI.  Similar rules would be

     http://host/path  ==>  http://about.host/path

     http://host/path.suffix  ==>  http://host/path.about

   If all you can do is add a prefix and/or suffix to the URI, rules
   like these aren't possible.

   The most elaborate fix would be to permit rules like the ones in
   Apache RedirectMatch directives, but I can see why you wouldn't go
   there.  The minimal mechanism required for the above would be to say
   what prefix and/or suffix should be matched and stripped off.  This
   makes the rule language nicely symmetric:

    <link-template
       pattern="prefix1{infix}suffix1"
       template="prefix2{infix}suffix2"
       ...>

   If you do this you may not have to get into more complex rule
   languages of the kind you suggest in your [[...]] remark; although
   if rich enough the extensions you suggest might work perfectly well
   to implement the rules I give above.

- We need to be careful about quoting.  If a DR is meant to be found
   via a CGI script invoked via a query URI (the link-template prefix
   has a ? in it), and the original URI already contains significant
   CGI characters like &, then an application could get into big
   trouble.  This needs to be either handled directly somehow (I can't
   imagine how), or left as a combination of a big scary disclaimer and
   a security warning.

- I think you need to warn that this protocol should only be applied
   to URIs not containing a fragment id.  If you allow fragment ids
   you're going to get into serious problems with both quoting and
   semantics.

Overall I think this is great progress and I like the way it's
structured.  Personally I'm eager to see this deployed, as I think it
will serve the semantic web, and in particular the Science Commons
data project's agenda, very well.

-Jonathan

[1] http://tools.ietf.org/html/draft-hammer-discovery-00

[*] Footnote (not relevant unless you care about how RDF might
interact with this discovery protocol): Suppose U1 and U2 both name
(denote, identify, refer to, are interpreted to be, etc.) some
resource R, and suppose that

    <U1> describedby <DR1>.
    <U2> describedby <DR2>.

Then necessarily

    <U1> describedby <DR2>.
    <U2> describedby <DR1>.

To limit DR1 to the URI U1, you would have to use a different
relation, say describesBindingOf:

    <DR1> describesBindingOf "U1"^^xsd:anyURI .
    <DR2> describesBindingOf "U2"^^xsd:anyURI .
Received on Thursday, 29 January 2009 14:57:03 UTC