Re: Clarifying what a URL identifies (Four Uses of a URL) from David Booth on 2003-01-27 (www-tag@w3.org from January 2003)

From: David Booth <dbooth@w3.org>
Date: Mon, 27 Jan 2003 00:07:05 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: "Jonathan Borden" <jonathan@openhealth.org>, Tim Berners-Lee <timbl@w3.org>, www-tag@w3.org
Message-Id: <5.1.0.14.2.20030124183148.037c2418@localhost>
At 12:02 AM 1/24/2003 -0500, Sandro Hawke wrote:

> > URIs should denote cars or pictures of cars.  If a "#" is a part of a URI,
> > then Sandro's proposal[7] is mixture of the two approaches.  If a "#" is
> > *not* a part of a URI, then Sandro's proposal[7] is an example of the
> > "different context" approach.
>
>I consider "#" to be part of a URI, so yes, my proposal is an odd
>mixture.  In general, I favor the context approach;

I do too, because it has the crucial benefit of not requiring everyone to 
agree on which of the two kinds of things (e.g., the car or the picture of 
the car) a URI is supposed to denote.  (Though I will say, I find the 
mixture of the "different names"[4] and "different context"[5] approach to 
be aesthetically displeasing!)

Here's a fuller explanation.  Suppose the TAG or a standards body were to 
take the "different names"[4] approach and "clarify" RFC2396 by declaring 
something to the effect of:

         "a URI always identifies only one thing, and in TimBL's car 
example, that
         thing would be the picture of the car, rather than the car itself"

Therefore, anyone who writes a language that claims to use true URIs and 
properly conforms to RFC2396 would be implicitly agreeing to those 
semantics.  Fine so far.

However, you can never control how someone might (legitimately) *refer* to 
that specification.  It is *impossible* to stop someone from:
(a) defining a useful language L1 that uses URL-like strings that are 
required to conform to the URL *syntax* defined in RFC2396; and
(b) clearly stating that the URL-like strings used in L1 are not true URLs 
because they do not have the *semantics* of URLs as defined in RFC2396; and
(c) declaring that in language L1, such unadorned strings are used to 
identify the *car*, rather than the *picture* of the car, because, in the 
applications for which L1 was designed, those are the things that far more 
often need to be identified; and
(d) providing some other convention (using either the "different names"[4] 
or "different context"[5] approach) for referring to the *picture* of the car.

Such a language would not in any way be anti-social, in error, or in 
violation of RFC2396.  Nonetheless, it would mean that if you came across a 
statement like:

         http://x.org/mycar #is #beautiful

you would have no way of knowing whether the statement was talking about 
the car or the picture of the car, UNLESS you knew what language the 
statement was written in.  I.e., if you knew that the statement was written 
in language L1, then you would know that http://x.org/mycar was referring 
to the *picture* of the car.  If it wasn't, then you might guess that the 
statement was referring to the car, according to RFC2396.

On the other hand, if the TAG or a standards body were to make the 
*opposite* choice, and declare that an RFC2396 URI always identifies the 
*picture* of the car, then you have the same problem all over again, the 
other way around.  Someone could define a useful language L2 that uses 
URL-like strings to identify the cars instead of the pictures of 
cars.  Thus, once again if you came across a statement like:

         http://x.org/mycar #is #beautiful

you would have no way of knowing whether the statement was talking about 
the car or the picture of the car, UNLESS you knew what language the 
statement was written in.

But is this really a problem?  No.  If you know that the statement is 
written in L1, then http://x.org/mycar clearly refers to the car; but if 
the statement is written in L2, then http://x.org/mycar clearly refers to 
the *picture* of the car.  It is only a problem if you do NOT know what 
language the statement is written in.  But since you ALWAYS have to know 
what language the statements are written in anyway (if you wish to 
determine the semantics), the only problem is that you will simply have to 
remember that different languages do things differently -- which you have 
to do anyway.

In short, if the TAG were to follow the "different names" approach, and 
were to declare which kind of thing a URI *always* denotes (i.e., the car 
or the picture of the car), people could still *legitimately* use strings 
that look and smell like URIs but have different semantics from what 
RFC2396 or any other standard might say.  That's a *good* thing.

>the proposal [7]
>is a hopefully-clever hack to allow RDF to have disambiguated
>identifiers without any change to the syntax or formal semantics.
>(The only change is to the social meaning of identifiers.)
>
> > 7. http://www.w3.org/2002/12/rdf-identifiers/
>
>     -- sandro

1. 
http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm#ViewSourceEffect
2. http://www.w3.org/TR/webarch/#uri-use (section 2.2.3)
3. http://www.w3.org/TR/webarch/#uri-use (section 2.2.5)
4. http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm#DifferentNames
5. 
http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm#DifferentContext
6. http://lists.w3.org/Archives/Public/www-tag/2003Jan/0287.html
7. http://www.w3.org/2002/12/rdf-identifiers/


-- 
David Booth
W3C Fellow / Hewlett-Packard
Telephone: +1.617.253.1273
Received on Monday, 27 January 2003 00:07:31 UTC