RE: Review of "Cool URIs for the Semantic Web" - please confirm from Booth, David (HP Software - Boston) on 2007-08-26 (www-tag@w3.org from August 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Sat, 25 Aug 2007 22:44:04 -0400
To: "Leo Sauermann" <leo.sauermann@dfki.de>, "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Cc: "Richard Cyganiak" <richard@cyganiak.de>, "Susie Stephens" <susie.stephens@gmail.com>, "Ivan Herman" <ivan@w3.org>, <www-tag@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C203150158@tayexc19.americas.cpqcorp.net>
Lee & Richard,

I looked at your latest version of 
http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/
and again, kudos for doing this.  It is a very helpful document, and
very readable.  I do have a few suggestions though.  Most are relatively
minor, but one is quite important (the media type limitation of hash
URIs).


1. I notice that you sometimes use the word "resource" to specifically
mean "non-information resource":
[[
There should be no confusion between identifiers for documents (URLs)
and resource identifiers.
]]
and sometimes you use it more broadly:
[[
Another question is where to draw the line between traditional web
documents and other, non-document resources.
]]
I think it would be best to use the word "resource" only in the board
sense (consistent with the WebArch use of the term), and use a more
specific term when you wish to refer specifically to "non-information
resources".  The presentation does a nice job of motivating the
difference between information resources and non-information resources,
so I personally would suggest just using "non-information resource" when
that is what you mean, even if it is rather cumbersome.  

2. One piece of advice that I only see implictly in Figure 2, but I
think is worth stating more visibly: In setting up content negotiation
to serve or redirect to RDF or HTML, the human-oriented version (HTML)
should be the default.  This way naive dereferencing will yield
something human oriented.  (It is much more reasonable to require
Semantic Web apps to set the Accept headers properly to receive RDF and
to require naive human users to set them properly to receive HTML.)

Furthermore, the human oriented result should indicate both ways that
the RDF version can be obtained: by setting the Accept headers properly;
or via the RDF-specific URL, which should be provided.

3.  Regarding this:
[[
According to W3C guidelines, we may have a web document if all its
essential characteristics can be conveyed in a message. This is not a
very precise definition. Our recommendation is to err on the side of
caution: Whenever an object of interest is not clearly and obviously a
document, then it's better to use two distinct URIs, one for the
resource and another one for the document describing it.
]]
I suggest *not* quoting the current WebArch definition of "information
resource", because it is so badly flawed that I think it does more harm
than good to repeat it.  I think the exposition prior to this point has
already done qutie a good job of explaining the difference between
information resource and non-information resource, so I think it would
be better to just change these sentences to something like:
[[
Since the current definition of "information resource" is not entirely
clear, our recommendation is to err on the side of caution: Whenever an
object of interest is not clearly and obviously a Web page, then it's
better to use two distinct URIs, one for the resource and another one
for the document describing it.
]]
(I changed the word "document" to "Web page" to more inclusive of
dynamically generated content, but you might find still better ways to
express this.)


4. I note using content negotiation with 303-redirect in the manner you
describe means that there is no single URI that (abstractly) denotes the
information resource as a whole that describes the resource in question.
(In other words, another way to do this would be to 303-redirect to a
generic information resource URI, and then use content negotiation when
that URI is accessed.)  I don't see a big harm in this, and it does
eliminate an extra dereferencing step

5. These statements are not quite correct:
[[
This means a URI that includes a hash cannot be retrieved directly, and
therefore cannot identify a web document. We can use them to identify
other, non-document resources, without creating ambiguity.
]]
The meaning of the fragment identifier is determined by the media type
of the representation that is returned.  Depending on the media type,
the URI with the fragment identifier *might* be able to identify other,
non-odcument resources.  But for HTML, for example, the fragment
identifier is used to identify a portion of the returned document.  This
has significant consequences for the use of hash URIs.  I suggest
rewording this to something like:
[[
The part before the hash symbol ("#") is called the racine.  Thus, in
some cases the URI with the fragment identifier can be used to identify
a non-document resource, and clients can dereference the racine to find
an associated web document that describes the non-document resource.
However, there is a catch: the meaning of the fragment identifier
depends on the media type that is returned when the racine is
dereferenced.  If RDF is returned, then the fragment identifier can
identify an arbitrary non-document resource.[ref:
http://www.ietf.org/rfc/rfc3870.txt]  But if HTML is returned (for
example), then the fragment identifier designates an element within the
document,[ref: http://www.ietf.org/rfc/rfc2854.txt] and thus the URI
with the fragment identifier cannot be used to identify an arbitrary
non-document resource.  

Consequently there is a trade-off in using hash URIs to identify
non-document resources, because the hash URI limits the future media
types that you can serve.  If you only ever wish to serve RDF, then hash
URIs will work fine.  But if you think you may someday wish to serve
HTML in addition or instead, then you should use 303 URIs.
]]

6. In your example of 303 URIs, http://www.acme.com/id/alice
303-redirects either to http://www.acme.com/data/alice (for RDF) or
http://www.acme.com/people/alice (for HTML).  Since the RDF and HTML
versions are in some sense intended to provide different representations
of the same information (either for machine or human consumption), I
would think it would be administratively more natural to instead use
something like:
	http://www.acme.com/description/alice.rdf  and
	http://www.acme.com/description/alice.html
for the two document URIs.  Obviously the choice is up to the
adminstrator, but I thought I would mention it, because the example
looks a little odd to my eyes in this way.

7. Section 4.3 ("Choosing between 303 and hash") needs to mention the
media type limitation of hash URIs explained in comment #5 above.

Again, thank you for this very valuable contribution to the community.


David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent
the official views of HP unless explicitly stated otherwise.
Received on Sunday, 26 August 2007 02:48:19 UTC