Re: Clarifying what a URL identifies (Four Uses of a URL) from David Booth on 2003-01-24 (www-tag@w3.org from January 2003)

From: David Booth <dbooth@w3.org>
Date: Thu, 23 Jan 2003 21:23:07 -0500
To: "Jonathan Borden" <jonathan@openhealth.org>
Cc: Tim Berners-Lee <timbl@w3.org>, Sandro Hawke <sandro@w3.org>, www-tag@w3.org
Message-Id: <5.1.0.14.2.20030123131549.0a150008@localhost>
At 10:56 PM 1/22/2003 -0500, you wrote:

> >          http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm .
>
>Is this attempt at clarifying what a URL identifies intended to shed a
>glimmer of light on what a URI is intended to identify? Seriously, since you
>are attempting to be more precise, what exactly are you talking about?

It is an attempt to document reality, to explicitly acknowledge that when 
you use a URL to identify an abstract concept (such as a particular concept 
of love), it is common to use that same URL in conjunction with identifying 
four kinds of things: the name of that concept (i.e., the URL string 
itself); the concept; a Web location from which a description of that 
concept might be retrieved; and a document instance that is retrieved from 
that Web location.  This is not only common, it is very helpful, because it 
provides a powerful "view source" effect[1].  It is also "good practice" as 
recognized by the TAG.[2]

On the other hand, ambiguity about what a URI denotes is a Bad Thing, as 
the TAG has stated[3].  To prevent ambiguity, it is necessary to either use 
"different names"[4] or "different context"[5].

These observations have helped me (at least) mentally reconcile the 
positions that I think I've heard on the httpRange-14 issue[6], so I'm 
hoping they will help others.  Tim Berners-Lee describes[6] the issue using 
an example in which a URL is used to identify an actual car, but the 
document instance that can be retrieved from that URL is a picture of the car:

>The issue only arises when, in the semantic web, [. . .] we ask ourselves 
>what exactly is the thing we should say is identified by some http URI - 
>the picture of the car, or the car? [. . .] I want to use the URI to 
>identify the picture. Roy has always felt it identifies the car.

In particular, if you believe that it's adequate to use "different 
context"[5] to distinguish the different uses, then there is no need for 
the TAG to definitively say whether the URI identifies the car or the 
picture of the car.  On the other hand, if you believe that it's important 
to use "different names"[4] to distinguish the different uses, then there 
is a need for the TAG to decide which thing the URI is supposed to identify 
-- the car or the picture of the car.

As far as I can tell, either approach can work fine for the Semantic 
Web.  The benefit of the "different names" approach is that it's easier to 
know what the URI denotes.  The benefit of the "different context" approach 
is that it doesn't require everyone to agree on this question of whether 
URIs should denote cars or pictures of cars.  If a "#" is a part of a URI, 
then Sandro's proposal[7] is mixture of the two approaches.  If a "#" is 
*not* a part of a URI, then Sandro's proposal[7] is an example of the 
"different context" approach.

In either case, syntactic conventions are necessary for distinguishing 
between the car and the picture of the car, so that if you are given a 
statement X that refers to the car, you can easily convert it to a 
statement X' that instead refers to the picture of the car.  (Example: if 
"THE http://x.org/mycar" refers to the car, then "GET http://x.org/mycar" 
could refer to the picture of the car.  The conversion rule was to change 
"THE" to "GET".)  Syntactic conventions that permit simple conversions from 
one reference to the other are important in order to achieve the "view 
source" effect[1], which the TAG has recommended[2] as "good practice".

>If you are going to use the (capitalized) term "Semantic Web", can you limit
>yourselves to discussing the layers of the SW that have been concretely
>defined i.e. the set of WDs produced by the RDF Core WG and the set of WDs
>produced by the WebOnt WG?
>
>In none of these documents do the problems you suggest with ambiguities in
>URIs exist. For example, in none of these documents is there a shred of
>confusion between a URI e.g.
>
>http://www.w3.org/
>
>and the string of characters that forms the URI i.e. 'h' 't' 't' 'p' ':' '/'
>'/' ...

AFAIK, I think you're correct.  I think they tend to use the "different 
context"[5] approach to prevent the ambiguity.  But if they're using "#" to 
distinguish between the car and the picture of the car, *and* you consider 
the FragID -- the # part -- to be a part of the URI, then they're using the 
"different names"[4] approach.

1. 
http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm#ViewSourceEffect
2. http://www.w3.org/TR/webarch/#uri-use (section 2.2.3)
3. http://www.w3.org/TR/webarch/#uri-use (section 2.2.5)
4. http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm#DifferentNames
5. 
http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm#DifferentContext
6. http://lists.w3.org/Archives/Public/www-tag/2003Jan/0287.html
7. http://www.w3.org/2002/12/rdf-identifiers/


-- 
David Booth
W3C Fellow / Hewlett-Packard
Telephone: +1.617.253.1273
Received on Thursday, 23 January 2003 21:23:38 UTC