Re: RFC2396bis wording, opinions? from Pat Hayes on 2004-06-03 (uri@w3.org from June 2004)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 3 Jun 2004 13:35:30 -0500
To: Tim Berners-Lee <timbl@w3.org>
Cc: uri@w3.org, "'Tim Berners-Lee'" <timbl@w3.org>, Dan Connolly <connolly@w3.org>, Larry Masinter <LMM@acm.org>
Message-Id: <p06001f07bce51562ebea@[10.0.100.76]>
>On May 29, 2004, at 13:21, Larry Masinter wrote:
>
>>Roy's latest:
>>
>>      Resource
>>>        This document doesn't limit the scope of what might be a resource;
>>>        rather, the term "resource" is used in a general sense for whatever
>>>        might be assigned a URI for the sake of later identification.
>>
>>I'm unhappy with 'assigned', because the notion of 'assignment'
>>of a URI as if it were an act, performed by some authority.
>
>- Well, in most cases I can think of it is.
>- When someone publishes a document in HTTP space,
>- When someone writes an RDF document published in HTTP space, they 
>assign meaning to the URIs which are local identifiers within that 
>document.

I really think this way of saying it is misleading. Publishing some 
RDF(/S/OWL) really is not *assigning* anything to the URIs used in 
there. It is *saying something about* what they mean, OK, but that's 
not an *assignment* of a meaning. For example, it may well be the 
case (being strict, will *always* be the case) that no amount of RDF 
will uniquely pin down the referent of a URI.  The URI may well not 
identify anything in particular, nor is it required to in order for 
RDF engines to work properly.

Also, doing this , ie publishing some RDF, is not an act *done to the 
URIs*., any more than my writing this paragraph is an act done to 
some English words. (OK, you COULD say that it was, in some very 
stretched sense: but stretching senses that much is just begging to 
be misunderstood unless you are very careful to explain what you are 
trying to say.)

>- When someone mints a uuid and decides to use it for a software 
>interface, there is an assignment, if you like, by that person.
>
>>Perhaps some URIs are assigned, (URNs) but for many schemes, there is no
>>process of 'assignment'. No one 'assigns' the meaning of a
>>'data' URI, or an HTTP URI with a query parameter.
>
>On the contrary, the owner of a user of the space does assign a 
>meaning to an entire space of URIs and for example when publishing 
>an (HTTP GET action)  HTML form which points generically into that 
>space. From then on, the users of the form share an expectation of 
>what those URIs identify as a function of the parameters.

BUt, as Larry said, there is in this case no PROCESS. There seems to 
be a very clear difference between the URN case, say, and this kind 
of a case. Like Larry, my own understanding of this kind of use of 
'assign' would naturally include the URN case but would definitely 
not include this kind of a case, unless of course someone were to 
carefully explain how they were using the word 'assign' in this very 
wide-ranging sense.

>
>'data' is a case in which the assignment for all URIs is done by the 
>scheme definition. Similary, hash URIs.
>
>>  I think
>>rather than URIs are _used_, and that they _have_ meaning which
>>comes solely from the meaning assigned by interpretation by rules of
>>the URI scheme, and not from some other out-of-band communication
>>or knowledge.
>
>More or less true.  More: yes, the scheme defines everything.

Hey, you can't have it both ways. Just above you said that the RDF 
defined the meaning in some cases. Since RDF syntax can use any URI 
reference, it can't be entirely the URI scheme that determines 
meaning, if that meaning can be altered by including the URI inside 
some RDF.

>Less:  actually, the HTTP protocol itself doesn't define that one 
>expects to get the same (in some sense) thing when clicking on 
>something one has bookmarked. The expectation of persistence is set 
>out of band - it may be implicit or mentioned in the document, etc.
>
>>  Bringing in 'assignment' makes it seem like it's
>>possible to assign some meaning that is different than the one
>>that is naturally derived from the interpretation by the scheme,
>
>I hope not.  The ability to assign for URIs come though the way the 
>scheme works.
>
>>without any communication channel for sending that meaning.
>>
>>So I don't like this as much.
>
>I think one go on removing more and more text, but the current text
>
>    Resource
>        This document doesn't limit the scope of what might be a
>        'resource'; rather, the term 'resource' is used for whatever it
>        is that a Uniform Resource Identifier identifies; each URI scheme
>        defines the range of things that are identified by
>        URIs using that scheme. Commonly, URIs are used to identify
>        Internet accessible objects or services; for example, an electronic
>        document, an image, a service (e.g., "today's weather report for
>        Los Angeles"), a collection of other resources. However,
>        a resource need not be accessible via the Internet; URIs might
>        be used to identify human beings, corporations, bound books in a
>        library, and even abstract concepts.
>
>is good for me.

The thing about this that has bothered me since day one is, HOW does 
a URI scheme (ANY URI scheme) identify things that are not accessible 
by the Internet? And what exactly does 'identify' mean in this case? 
Ray cited this:
------
   Identifier
       An identifier embodies the information required to distinguish
       what is being identified from all other things within its scope of
       identification. Our use of the terms "identify" and "identifying"
       refer to this process of distinguishing from many to one; they
       should not be mistaken as an assumption that the identifier
       defines the identity of what is referenced, though that may be the
       case for some identifiers.

-------
Now, taking these together, it seems that a URI which is in a URI 
scheme that identifies, say, numbers (to take an example of abstract 
concepts that we are all familiar with) must have the characteristics 
that the URI itself contains enough information to enable some 
process to distinguish a particular number from all other numbers. So 
take the number zero. Clealry zero needs to be distinguished from, 
say, 137, and we have well-known ways to do that. But these 
well-known ways depend on a basic assumption that we are dealing with 
the natural numbers, where zero is unique. Until the 1940s or so this 
was unproblematic, but now we know that there are nonstandard models 
of arithmetic.  So there are in fact many zeros, strictly speaking. 
Does this zero need to be distinguished from the real number zero? 
The hexadecimal number zero? The complex number zero? The singleton 
set containing the empty set? A special element of a commutative 
ring? And so on. Suddenly the requirement to be an 'identifier' 
instead of merely a denoting name, becomes a lot harder to live up 
to. And this is just for the numbers, remember. There is a URI in the 
OWL documentation which is something like 
vin:PastaWithRedSpicySauceCourse. Its fairly clear what it is 
supposed to mean, in some relaxed sense of 'mean': it means, courses 
whose primary ingredient is pasta with a spicy red sauce.  But the 
RDF/OWL specs and the above text force us to be pickier than this; we 
need a much tighter sense. We need it to IDENTIFY something. 
Moreover, we need it to do this by including enough stuff in the URI 
itself ("embodies the information required") to enable a process 
("this process of distinguishing") to pick out a single, unique 
'meaning', which in this case has to be a particular class - in 
OWL-DL, a particular set - of courses, of which spaghetti marinara is 
probably a representative sample. But this is IMPOSSIBLE: there is no 
such single set of courses (whatever that means, which leave aside 
for now). There will always be fringe cases about which chefs and 
sommeliers might legitimately disagree: and in any case, how can 
anyone or any process get to this set of culinary things from the 
information contained in the URI all by itself.

Now, if your reaction at this point is that you don't want to know 
about this weird mathematical stuff, and don't even care much for 
spicy pasta, I sympathize. I don't, either: but BY INSISTING THAT 
URIs MUST IDENTIFY, y'all have FORCED us to think and worry about 
issues like this. Identifiers are REQUIRED, by definition, to 
identify UNIQUELY.  Names are not. It would be SUCH a relief if URIs 
when used as names could just be used as names, rather than 
identifiers. There is absolutely no need for them to be identifiers, 
unless you want to use them actually get hold of a unique identified 
thing. URLs need to identify what it is they locate, sure. URNs are 
declared to be identifiers by statute. But both of these are special 
cases, and most other uses of URI references (in particular, all the 
uses in RDF and OWL) use them as names, not as identifiers. 
Everything is a lot easier that way, which is probably why language 
evolved the way it did instead of turning out to be an ancient 
Mesapotamian programming language.

Pat

>Tim


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 3 June 2004 14:35:42 UTC