Re: Resources and URIs from pat hayes on 2003-04-23 (uri@w3.org from April 2003)

From: pat hayes <phayes@ai.uwf.edu>
Date: Wed, 23 Apr 2003 00:09:14 -0500
To: Tim Bray <tbray@textuality.com>
Cc: "Roy T. Fielding" <fielding@apache.org>, uri@w3c.org
Message-Id: <p05111b0bbacb5dc46191@[10.0.100.12]>
>Roy T. Fielding wrote:
>
>>If you have suggested wording to change, then please suggest it.
>>If you don't, then this is a redundant discussion and I have already
>>answered it before:
>
>I have a suggested wording change, because while I have been largely 
>unimpressed by the philosophical jargon being thrown around here 
>recently

It is sad when a carefully worded request for clarification can be 
dismissed as "philosophical jargon", but let it pass.

>, I do agree that the current definition "A resource can be anything 
>that has identity" offers significant room for improvement; among 
>other things it deserves to be called out and not sequestered in a 
><dd>.
>
>Here you go:
>
><h3>Resources and URIs</h3>
>
>Many different abstract, informational, and physical things may be 
>resources.  URIs exist to identify resources, but this "identity"

"Identity"/"identify" ?

>relationship has both social and technical dimensions.
>
>For example, it is incontrovertible that the URI 
>http://www.tbray.org/A0.png identifies a resource which is a 
>particular bitmapped graphic (I assert this, I control tbray.org, 
>and the assertion is verifiable via technical means)

Sorry, that is not incontrovertible, because you have not said what 
you mean by "identifies as a resource". (I know this must be 
infuriating, but please bear with me, as this gets to the heart of 
the matter.) In one sense, that phrase refers to a process involving 
HTTP protocols and the resulting document (or maybe a suitable 
abstraction of it: I know there are some delicate issues here) is the 
thing identified-1; and then what you say is obviously correct. In a 
different sense, however, a certain string of the form "xxx-xx-xxxx" 
identifies-2 me because it is my SS number; but there is absolutely 
no implication that anyone can use that string to access me, and no 
*technical* means to verify the claim.  It is just that according to 
certain socially accepted rules, a description like 'the person with 
SS number xxx-xx-xxxx' in fact refers to - denotes - me.  And in that 
second sense of "identify", what you say is just plain wrong.  Any 
control you have is irrelevant, since what the URI "identifies" 
depends on whatever is said using the URI together with the 
conventions and specifications which govern reference of expressions 
*of the language used to say it*, just as 'xxx-xx-xxxx' becomes 
meaningful when used in a certain kind of English description, and 
what is said about the referent may be anything the user chooses to 
say. ("The person with SS number xxx-xx-xxxx is an argumentative pain 
in the ass.")  Network transfer protocols are irrelevant to what is 
"identified" in this sense, but the meaning conventions of the 
'saying' language are central, so what the URI "identifies" in this 
sense depends on the surrounding linguistic (or other, eg pictorial) 
conventions, and is therefore most definitely controvertible.

My earlier request for clarification can be phrased as saying, which 
sense of "identify" do y'all mean?  And to observe that it is not 
sufficient to answer 'both' or 'it doesn't matter which', because the 
properties of the two senses differ radically, and it does matter. In 
the second sense, for example, one could rationally claim that 
http://www.tbray.org/A0.png in fact identifies the  the cardinality 
of the set of the natural numbers, which is what the symbol in that 
graphic is conventionally used to denote.

URI references are being used to "identify" in both these senses on 
the Web. Deployed technology depends on both senses being available. 
RDFS and DAML+OIL and OWL all use both senses, for example; they use 
URIrefs with fragIds to denote (the second sense) and they use URIs 
in conventional ways to locate web pages with RDF, OWL, etc, on them 
(which themselves contain URI references which might denote... and so 
on), as well as other web pages and documents. And in fact they could 
use URIs with no fragIDs to denote, as well, so a URI actually can 
have *both* kinds of "identify" used on it, and they needn't identify 
the same thing. The semantic rules for denotations of URI references 
in RDF (and hence in RDFS, OWL, etc.) explicitly make no reference to 
the HTTP protocols which determine what the URI 'identifies' in the 
first sense (or indeed to any social conventions): a decision which 
was taken, by the way, largely because this issue - of what a URI 
should be understood to *denote* - was seen as a larger issue than 
merely one for RDF to decide, as belonging more to a group like yours.

If I can make a suggestion, it might be a good idea to declare that 
the two senses SHOULD identify the same thing: that any use of a URI 
(without a fragID) as a denoting expression should always be 
understood to identify-2  - denote - whatever it identifies-1; if 
there is any such thing, at any rate. Tim B-L has argued this 
position (although it would seem to be at odds with W3C practice, 
which tends to put explanatory notes at the end of http URIs which 
tell you in English what they are supposed to denote, which is not 
usually the note itself.) This would be a kind of global constraint 
on the semantic conditions applied to all web formalisms which claim 
to have a referential semantics.  But the point is that this really 
does *need to be said*, if its supposed to be true: it's not 
necessarily true, or so blindingly obvious that it would be 
ridiculous to deny it.  Just being ambiguous about what you mean by 
"identify"  doesn't say it.

>and that the URI http://www.w3.org/1999/xhtml identifies a resource 
>which is a well-known markup vocabulary (established by social 
>convention).

Surely it is established by the meaning of the English assertion 
found on that web page, which says: "This is an XML namespace defined 
in the XHTML 1.0: The Extensible HyperText Markup Language 
specification ".  Nothing particularly social there, seems to me.

>  It is possible for ambiguity to enter this relationship; for 
>example, does http://www.w3.org/Consortium identify an organization 
>or a particular HTML page on its website?

If that can identify an organization referred to on the web page, why 
cannot your first example identify the cardinality of the set of the 
natural numbers?

But the issue cuts deeper than that, since when the stuff on that 
page is itself a referential language with its own rules (set by the 
W3C) for what gets "identified", the thing identified-2 might be 
anything that can be referred to according to the formal rules of the 
W3C spec that defines the meaning of the language used on the page 
identified-1 by that URI.  This is what allows 'semantic' markup on 
one page to use terminology defined on a different page; without 
this, the whole scheme would collapse.

In fact, I think you are using this kind of rule yourself, in your 
http://www.w3.org/Consortium  example above. Why would nobody even 
think that this might identify something totally unrelated, like the 
color red? Because the *English text* of the web page isn't about the 
color red, right? It is the *English meaning of the symbols on the 
web page*, using nontechnical conventions that have nothing to do 
with transfer protocols, which identify the referent. Similarly, but 
using different conventions for reference, it's because the jpeg file 
at http://www.coginst.uwf.edu/images/people/phayes is *a picture of 
me* that the URI can be ambiguously understood as referring to either 
the graphic or to me, but not, say, to Idi Amin. It is because 
http://www.coginst.uwf.edu/~phayes  is *my home page* that this can 
be ambiguously understood as referring to my home page or to me. The 
conventions for determining  reference depend on the nature of the 
representations (in a sufficiently broad sense) found on the page. 
Pages like http://www.w3.org/1999/xhtml seem to obey a similar 
convention: they refer to abstractions which the document you find 
there *says* they refer to, using English enriched with a specialized 
technical W3C terminology. You might call these conventions 'social', 
but that seems to me to just be an escape clause; and in any case, it 
doesn't handle cases where there are no existing 'social' conventions 
to fall back on, as with formalisms designed for use by software 
agents on the Web. Nevertheless, the same kind of story seems to 
apply: the meanings of URIrefs on a page full of RDF is determined in 
the first instance by what the enclosing RDF asserts about them, 
according to the RDF semantic conditions.

But now, here's the problem. If we acknowledge that denotations of 
URIs are determined, or even influenced, by the enclosing language, 
how can anyone - even the owner of the URI - specify what it denotes 
in all other languages, including other formal languages? Suppose one 
WG assigns a new URI to some entity and publishes a document, using 
the language of RFC 2119, that the URI MUST denote that thing; even 
so, without some global conventions on denotations of URIs, another 
language's spec may declare that in *this* language it will denote 
something else; after all, it is up to that spec to define the 
meanings - denotations - of its own expressions, surely; particularly 
if it is a formal notation with no associated 'social' conventions to 
predetermine meanings.  Apart from the obvious problems of getting 
software agents to read specs written in English, this seems to be a 
basic fault line in the existing conventions, and problems are 
avoided only by people agreeing to try to dance in step with one 
another, as it were. Just talking about "identifies" as a kind of 
fuzzily defined relation between URIs and resources doesn't resolve 
this issue. Appealing to network exchange protocols is irrelevant 
when we are discussing denotations; and social conventions are 
irrelevant when we have a potential mismatch between formal 
specifications.  What we seem to need, in fact, is precisely 
something analogous to 'social conventions' for URIs used in a formal 
web context where English, pictorial and other existing conventions 
do not apply; and preferably, in a form that can be read by, or 
incorporated into the code of, software agents.

I have phrased this in deliberately provocative language for 
emphasis, but these kind of issues  already arise, if only in small 
ways. For one example, some of the XSD datatypes don't measure up to 
the requirements of an RDF datatype, so the relevant URIrefs don't 
denote datatypes in RDF, no matter what the XML schema spec says 
about them.  Nothing to do with social conventions or network 
protocols, note.  Again, many uses of RDFS in fact impose more 
meaning on the rdfs: vocabulary than RDFS itself does, so that the 
referent of, say, rdfs:range (the property which relates a property 
to a class restriction on its value) is different when that URI 
reference is used in OWL from its meaning in RDFS; in fact, that 
particular URIref has at least three meanings, depending on whether 
it is used in RDFS, OWL-DL and OWL-Full.  There seems to be no way 
that this can be prevented from happening, even if one wished to 
prevent it.  None of these examples are fatal, but they do at the 
least put some severe strain on your account of the relationship 
between URIs and resources. None of this is particular to RDF, by the 
way: these kind of things will inevitably happen in any system of 
notations with formally defined meaning conditions.

If resources can be anything (with an identity), and if URIs really 
did identify things incontrovertibly or by technically specifiable 
means, then this situation wouldn't arise; but it arises all the 
time.  For example, where *exactly* is it specified that 
http://www.w3.org/2001/XMLSchema#string denotes the particular 
datatype described at http://www.w3.org/TR/xmlschema-2/#string ? Is 
that covered by what one reads at http://www.w3.org/2001/XMLSchema ? 
Or is implicit in what is said at 
http://www.w3.org/TR/xmlschema-2/#namespaces ? What conventions, 
social or otherwise, decide questions like this?

>A few principles apply:
>- While the definitions of URI and Resource are somewhat circular, 
>the existence of a URI does not imply the existence of a resource. 
>For example, the URI http://example.com/386751531 identifies no 
>resource.

True.

>- Formally, resources could exist without URIs - for example, there 
>is a picture of my cat somewhere on http://www.tbray.org but I'm not 
>publishing a URI.  However, such resources have no practical import 
>or utility.

No, they have enormous practical import and utility. For example 
consider a web service offering books for sale, using markup in an 
ontology language which supports simple class reasoning. An order is 
received for three copies of a certain book, and is accepted, and 
payment is made. This entire transaction may be done without any URIs 
being assigned to the particular copies of the book, but the 
reasoners are able to establish that three books exist (eg by 
reasoning about the number of things in the class of books attached 
to the order) and to draw conclusions about them, eg that they weigh 
enough to require a certain packaging method. The weight itself may 
have a URI assigned to it, as may the order, but there is no reason 
why the books must have; but surely the books exist, and indeed this 
conclusion can be reached by software. Similar examples can be given 
in almost any web-services scenario; or consider a web-accessible 
database of employees which uses a non-URI format for representing 
employees. So certainly things can exist which have no URI assigned 
to them, but can be referred to from Web pages, and be important to 
Web software.

One could argue that they should not be considered to be *resources* 
until they are identified-1 by a URI; but that decision, while 
coherent, seems to me to be arbitrary and unjustified, to yield no 
tangible benefits and likely to cause enormous trouble. It would 
require reasoners to distinguish things like the books in the example 
from things that do have URIs, and probably to keep track of the 
times at which URIs got assigned to the existing but not-yet-resource 
thingies that became resources when they were baptized with a URI, 
and so forth; and as far as I can see, all to no purpose.

>- URI schemes may impose constraints on the types of resource they 
>identify; for example, ftp: URIs identify files and directories 
>accessible using the FTP protocol.

In the first sense of "identify", yes.

>- Ambiguity in the characterization of what resource a URI 
>identifies is always undesirable and reduces the utility of both the 
>resource and the URI.

Again, this is not necessarily true for the second sense of 
"identify".  In fact, in the case of formal assertional languages, it 
is *inevitable* that there will be URIs which are ambiguous in some 
sense, since the semantic conditions of a formal model theory only 
impose necessary conditions on meanings, rather then specifying 
absolute, fixed, referents. For example, it is meaningless to ask 
what resource is 'identified' by, say, rdfs:range. The semantics of 
RDFS imposes restrictions on what this can refer to, but its exact 
referent will vary between alternative formal interpretations. There 
is no single 'resource' for it to denote.  Apart from this rather 
fundamental point, there is practical utility in for example allowing 
an ambiguously referring symbol to be traded between software agents 
whose primary function is to disambiguate it in a suitable context; 
such negotiations can be used to decide permissions inside agent 
domains, for example.  In another example, a URI might be generated 
to denote an entity known to exist which satisfies a query, but there 
may be no way for either the querier or the answerer to unambiguously 
determine the referent.  Nevertheless, such URIs may be of great 
utility in the querying process by allowing further queries to be 
made; and it may come about that as more information is obtained, the 
referent can be determined later.

Pat Hayes

PS.
Summary. The normal Web protocols are being used in the semantic web 
to transmit information which itself uses URI references to refer to 
things, *according to formal conventions set by W3C specs* (This is 
the new part: its not just English any more.). So URI references have 
two rather precise jobs to do instead of just one; they are needed to 
hyperlink the information together - to bind together descriptions, 
just as they bind together the HTML - but also they are used to refer 
to the things being *described by* the hyperlinked text.  Words like 
"identify" can be understood to refer to either or both of these 
uses, and so are inherently ambiguous for this reason; and moreover 
the properties of these two senses of "identify" are very different, 
so the spec needs to be clearer about which sense is intended; and if 
both are intended, it needs to spell out, or at least indicate 
roughly, the relationships expected to hold between them.


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam
Received on Wednesday, 23 April 2003 01:09:19 UTC