Re: Review of "Cool URIs for the Semantic Web" - please confirm from Richard Cyganiak on 2007-08-29 (www-tag@w3.org from August 2007)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 29 Aug 2007 12:42:11 +0200
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: "Leo Sauermann" <leo.sauermann@dfki.de>, "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>, "Susie Stephens" <susie.stephens@gmail.com>, "Ivan Herman" <ivan@w3.org>, <www-tag@w3.org>
Message-Id: <8248C705-5439-49A9-9660-2F38A6419BDF@cyganiak.de>
David,

Thanks for the careful review. Responses inline.

On 26 Aug 2007, at 04:44, Booth, David (HP Software - Boston) wrote:
> I looked at your latest version of
> http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/
> and again, kudos for doing this.  It is a very helpful document, and
> very readable.  I do have a few suggestions though.  Most are  
> relatively
> minor, but one is quite important (the media type limitation of hash
> URIs).
>
>
> 1. I notice that you sometimes use the word "resource" to specifically
> mean "non-information resource":
> [[
> There should be no confusion between identifiers for documents (URLs)
> and resource identifiers.
> ]]
> and sometimes you use it more broadly:
> [[
> Another question is where to draw the line between traditional web
> documents and other, non-document resources.
> ]]
> I think it would be best to use the word "resource" only in the board
> sense (consistent with the WebArch use of the term), and use a more
> specific term when you wish to refer specifically to "non-information
> resources".

You are right, we sometimes carelessly said “resource” to mean “non- 
information resource”. I have fixed this in a couple of places. Leo  
is travelling at the moment, the new version will be published when  
he is back.

> The presentation does a nice job of motivating the
> difference between information resources and non-information  
> resources,
> so I personally would suggest just using "non-information resource"  
> when
> that is what you mean, even if it is rather cumbersome.

In my experience, “non-information resource” is a somewhat  
unfortunate term. The “Cool URIs” document is somewhat informal in  
style, and intended as introductory material, and I think that using  
this term would cause confusion. Instead we beat around the bush  
using terms like “non-document resource” and “other resource”, “real- 
world thing”.

> 2. One piece of advice that I only see implictly in Figure 2, but I
> think is worth stating more visibly: In setting up content negotiation
> to serve or redirect to RDF or HTML, the human-oriented version (HTML)
> should be the default.  This way naive dereferencing will yield
> something human oriented.  (It is much more reasonable to require
> Semantic Web apps to set the Accept headers properly to receive RDF  
> and
> to require naive human users to set them properly to receive HTML.)

I agree with the advice. But “naive human users” will access URIs  
using a web browser, and AFAIK all browsers send Accept headers that  
indicate a preference for HTML. So I think it's not such an important  
question and doesn't have to be mentioned.

> Furthermore, the human oriented result should indicate both ways that
> the RDF version can be obtained: by setting the Accept headers  
> properly;
> or via the RDF-specific URL, which should be provided.

I disagree. We recommend to put a <link> header pointing to the RDF  
into the HTML's <head>, and I think that's all that is really required.

I think that a visible link to the RDF is not a bad idea, to  
advertise the presence of RDF more clearly, but that's more a  
marketing device than a technical necessity.

And talking about Accept headers on HTML pages is certainly overkill  
and will be useless and confusing to the majority of readers.

> 3.  Regarding this:
> [[
> According to W3C guidelines, we may have a web document if all its
> essential characteristics can be conveyed in a message. This is not a
> very precise definition. Our recommendation is to err on the side of
> caution: Whenever an object of interest is not clearly and obviously a
> document, then it's better to use two distinct URIs, one for the
> resource and another one for the document describing it.
> ]]
> I suggest *not* quoting the current WebArch definition of "information
> resource", because it is so badly flawed that I think it does more  
> harm
> than good to repeat it.  I think the exposition prior to this point  
> has
> already done qutie a good job of explaining the difference between
> information resource and non-information resource, so I think it would
> be better to just change these sentences to something like:
> [[
> Since the current definition of "information resource" is not entirely
> clear, our recommendation is to err on the side of caution:  
> Whenever an
> object of interest is not clearly and obviously a Web page, then it's
> better to use two distinct URIs, one for the resource and another one
> for the document describing it.
> ]]
> (I changed the word "document" to "Web page" to more inclusive of
> dynamically generated content, but you might find still better ways to
> express this.)

Personally I agree that the AWWW definition is flawed, but the flaws  
occur when you think about “robotic arms” and “prayer services”.

The purpose of our document is to provide practical guidance to the  
use of URIs on the Semantic Web, and for that purpose, the definition  
from AWWW works well enough. I don't think that citing it will  
mislead our readers towards wrong decisions.

Furthermore, it is, at the moment, the W3C's official word on the  
subject, so I feel that it ought to be mentioned.

> 4. I note using content negotiation with 303-redirect in the manner  
> you
> describe means that there is no single URI that (abstractly)  
> denotes the
> information resource as a whole that describes the resource in  
> question.
> (In other words, another way to do this would be to 303-redirect to a
> generic information resource URI, and then use content negotiation  
> when
> that URI is accessed.)  I don't see a big harm in this, and it does
> eliminate an extra dereferencing step

We follow http://www.w3.org/TR/swbp-vocab-pub/ here, which doesn't  
use a generic information resource. The fourth URI would add little  
practical benefit, and would make the approach harder to sell.

> 5. These statements are not quite correct:
> [[
> This means a URI that includes a hash cannot be retrieved directly,  
> and
> therefore cannot identify a web document. We can use them to identify
> other, non-document resources, without creating ambiguity.
> ]]
> The meaning of the fragment identifier is determined by the media type
> of the representation that is returned.  Depending on the media type,
> the URI with the fragment identifier *might* be able to identify  
> other,
> non-odcument resources.  But for HTML, for example, the fragment
> identifier is used to identify a portion of the returned document.   
> This
> has significant consequences for the use of hash URIs.  I suggest
> rewording this to something like:
> [[
> The part before the hash symbol ("#") is called the racine.  Thus, in
> some cases the URI with the fragment identifier can be used to  
> identify
> a non-document resource, and clients can dereference the racine to  
> find
> an associated web document that describes the non-document resource.
> However, there is a catch: the meaning of the fragment identifier
> depends on the media type that is returned when the racine is
> dereferenced.  If RDF is returned, then the fragment identifier can
> identify an arbitrary non-document resource.[ref:
> http://www.ietf.org/rfc/rfc3870.txt]  But if HTML is returned (for
> example), then the fragment identifier designates an element within  
> the
> document,[ref: http://www.ietf.org/rfc/rfc2854.txt] and thus the URI
> with the fragment identifier cannot be used to identify an arbitrary
> non-document resource.
>
> Consequently there is a trade-off in using hash URIs to identify
> non-document resources, because the hash URI limits the future media
> types that you can serve.  If you only ever wish to serve RDF, then  
> hash
> URIs will work fine.  But if you think you may someday wish to serve
> HTML in addition or instead, then you should use 303 URIs.
> ]]

I think you allege a limitation that does not exist. Let's take the  
example from the document, starting with the no-conneg solution.

     http://www.acme.com/about#alice

The racine serves only RDF/XML, thus /about#alice can identify a person.

Now if we decide to add HTML later on, our document proposes to 303- 
redirect from the racine to either of these URIs

     http://www.acme.com/about.rdf
     http://www.acme.com/about.html

and describe /about#alice in these. In the new setup, no  
representation is served at /about (it's a 303 URI now), and thus  
there is no limitation on the meaning of /about#alice.

An earlier version of the document stated that you could use any  
redirect code, not just 303, in this scenario. Stuart pointed out  
that this was wrong, for the reason you state. The 303 redirect is  
necessary when you do conneg with hash URIs, but with this redirect,  
the problem you describe disappears.

> 6. In your example of 303 URIs, http://www.acme.com/id/alice
> 303-redirects either to http://www.acme.com/data/alice (for RDF) or
> http://www.acme.com/people/alice (for HTML).  Since the RDF and HTML
> versions are in some sense intended to provide different  
> representations
> of the same information (either for machine or human consumption), I
> would think it would be administratively more natural to instead use
> something like:
> 	http://www.acme.com/description/alice.rdf  and
> 	http://www.acme.com/description/alice.html
> for the two document URIs.  Obviously the choice is up to the
> adminstrator, but I thought I would mention it, because the example
> looks a little odd to my eyes in this way.

I sympathize with this. But file extensions in URIs are somewhat  
frowned upon: “You may not be using HTML for that page in 20 years  
time, but you might want today's links to it to still be valid.” This  
quote is from Tim's “Cool URIs don't change” piece, which is the  
inspiration for the title of our document. So it didn't feel right to  
advocate the use of file extensions in our example scenario.

We would have liked to include a site that uses .rdf and .html  
extensions in our list of deployed examples from the web, but we  
couldn't find one at the time of writing. (There are several such  
sites now.)

> 7. Section 4.3 ("Choosing between 303 and hash") needs to mention the
> media type limitation of hash URIs explained in comment #5 above.

See above.



>
> Again, thank you for this very valuable contribution to the community.
>
>
> David Booth, Ph.D.
> HP Software
> +1 617 629 8881 office  |  dbooth@hp.com
> http://www.hp.com/go/software
>
> Opinions expressed herein are those of the author and do not represent
> the official views of HP unless explicitly stated otherwise.
>
>
Received on Wednesday, 29 August 2007 10:43:15 UTC