- From: pat hayes <phayes@ai.uwf.edu>
- Date: Wed, 23 Apr 2003 00:09:14 -0500
- To: Tim Bray <tbray@textuality.com>
- Cc: "Roy T. Fielding" <fielding@apache.org>, uri@w3c.org
- Message-Id: <p05111b0bbacb5dc46191@[10.0.100.12]>
>Roy T. Fielding wrote:
>
>>If you have suggested wording to change, then please suggest it.
>>If you don't, then this is a redundant discussion and I have already
>>answered it before:
>
>I have a suggested wording change, because while I have been largely
>unimpressed by the philosophical jargon being thrown around here
>recently
It is sad when a carefully worded request for clarification can be
dismissed as "philosophical jargon", but let it pass.
>, I do agree that the current definition "A resource can be anything
>that has identity" offers significant room for improvement; among
>other things it deserves to be called out and not sequestered in a
><dd>.
>
>Here you go:
>
><h3>Resources and URIs</h3>
>
>Many different abstract, informational, and physical things may be
>resources. URIs exist to identify resources, but this "identity"
"Identity"/"identify" ?
>relationship has both social and technical dimensions.
>
>For example, it is incontrovertible that the URI
>http://www.tbray.org/A0.png identifies a resource which is a
>particular bitmapped graphic (I assert this, I control tbray.org,
>and the assertion is verifiable via technical means)
Sorry, that is not incontrovertible, because you have not said what
you mean by "identifies as a resource". (I know this must be
infuriating, but please bear with me, as this gets to the heart of
the matter.) In one sense, that phrase refers to a process involving
HTTP protocols and the resulting document (or maybe a suitable
abstraction of it: I know there are some delicate issues here) is the
thing identified-1; and then what you say is obviously correct. In a
different sense, however, a certain string of the form "xxx-xx-xxxx"
identifies-2 me because it is my SS number; but there is absolutely
no implication that anyone can use that string to access me, and no
*technical* means to verify the claim. It is just that according to
certain socially accepted rules, a description like 'the person with
SS number xxx-xx-xxxx' in fact refers to - denotes - me. And in that
second sense of "identify", what you say is just plain wrong. Any
control you have is irrelevant, since what the URI "identifies"
depends on whatever is said using the URI together with the
conventions and specifications which govern reference of expressions
*of the language used to say it*, just as 'xxx-xx-xxxx' becomes
meaningful when used in a certain kind of English description, and
what is said about the referent may be anything the user chooses to
say. ("The person with SS number xxx-xx-xxxx is an argumentative pain
in the ass.") Network transfer protocols are irrelevant to what is
"identified" in this sense, but the meaning conventions of the
'saying' language are central, so what the URI "identifies" in this
sense depends on the surrounding linguistic (or other, eg pictorial)
conventions, and is therefore most definitely controvertible.
My earlier request for clarification can be phrased as saying, which
sense of "identify" do y'all mean? And to observe that it is not
sufficient to answer 'both' or 'it doesn't matter which', because the
properties of the two senses differ radically, and it does matter. In
the second sense, for example, one could rationally claim that
http://www.tbray.org/A0.png in fact identifies the the cardinality
of the set of the natural numbers, which is what the symbol in that
graphic is conventionally used to denote.
URI references are being used to "identify" in both these senses on
the Web. Deployed technology depends on both senses being available.
RDFS and DAML+OIL and OWL all use both senses, for example; they use
URIrefs with fragIds to denote (the second sense) and they use URIs
in conventional ways to locate web pages with RDF, OWL, etc, on them
(which themselves contain URI references which might denote... and so
on), as well as other web pages and documents. And in fact they could
use URIs with no fragIDs to denote, as well, so a URI actually can
have *both* kinds of "identify" used on it, and they needn't identify
the same thing. The semantic rules for denotations of URI references
in RDF (and hence in RDFS, OWL, etc.) explicitly make no reference to
the HTTP protocols which determine what the URI 'identifies' in the
first sense (or indeed to any social conventions): a decision which
was taken, by the way, largely because this issue - of what a URI
should be understood to *denote* - was seen as a larger issue than
merely one for RDF to decide, as belonging more to a group like yours.
If I can make a suggestion, it might be a good idea to declare that
the two senses SHOULD identify the same thing: that any use of a URI
(without a fragID) as a denoting expression should always be
understood to identify-2 - denote - whatever it identifies-1; if
there is any such thing, at any rate. Tim B-L has argued this
position (although it would seem to be at odds with W3C practice,
which tends to put explanatory notes at the end of http URIs which
tell you in English what they are supposed to denote, which is not
usually the note itself.) This would be a kind of global constraint
on the semantic conditions applied to all web formalisms which claim
to have a referential semantics. But the point is that this really
does *need to be said*, if its supposed to be true: it's not
necessarily true, or so blindingly obvious that it would be
ridiculous to deny it. Just being ambiguous about what you mean by
"identify" doesn't say it.
>and that the URI http://www.w3.org/1999/xhtml identifies a resource
>which is a well-known markup vocabulary (established by social
>convention).
Surely it is established by the meaning of the English assertion
found on that web page, which says: "This is an XML namespace defined
in the XHTML 1.0: The Extensible HyperText Markup Language
specification ". Nothing particularly social there, seems to me.
> It is possible for ambiguity to enter this relationship; for
>example, does http://www.w3.org/Consortium identify an organization
>or a particular HTML page on its website?
If that can identify an organization referred to on the web page, why
cannot your first example identify the cardinality of the set of the
natural numbers?
But the issue cuts deeper than that, since when the stuff on that
page is itself a referential language with its own rules (set by the
W3C) for what gets "identified", the thing identified-2 might be
anything that can be referred to according to the formal rules of the
W3C spec that defines the meaning of the language used on the page
identified-1 by that URI. This is what allows 'semantic' markup on
one page to use terminology defined on a different page; without
this, the whole scheme would collapse.
In fact, I think you are using this kind of rule yourself, in your
http://www.w3.org/Consortium example above. Why would nobody even
think that this might identify something totally unrelated, like the
color red? Because the *English text* of the web page isn't about the
color red, right? It is the *English meaning of the symbols on the
web page*, using nontechnical conventions that have nothing to do
with transfer protocols, which identify the referent. Similarly, but
using different conventions for reference, it's because the jpeg file
at http://www.coginst.uwf.edu/images/people/phayes is *a picture of
me* that the URI can be ambiguously understood as referring to either
the graphic or to me, but not, say, to Idi Amin. It is because
http://www.coginst.uwf.edu/~phayes is *my home page* that this can
be ambiguously understood as referring to my home page or to me. The
conventions for determining reference depend on the nature of the
representations (in a sufficiently broad sense) found on the page.
Pages like http://www.w3.org/1999/xhtml seem to obey a similar
convention: they refer to abstractions which the document you find
there *says* they refer to, using English enriched with a specialized
technical W3C terminology. You might call these conventions 'social',
but that seems to me to just be an escape clause; and in any case, it
doesn't handle cases where there are no existing 'social' conventions
to fall back on, as with formalisms designed for use by software
agents on the Web. Nevertheless, the same kind of story seems to
apply: the meanings of URIrefs on a page full of RDF is determined in
the first instance by what the enclosing RDF asserts about them,
according to the RDF semantic conditions.
But now, here's the problem. If we acknowledge that denotations of
URIs are determined, or even influenced, by the enclosing language,
how can anyone - even the owner of the URI - specify what it denotes
in all other languages, including other formal languages? Suppose one
WG assigns a new URI to some entity and publishes a document, using
the language of RFC 2119, that the URI MUST denote that thing; even
so, without some global conventions on denotations of URIs, another
language's spec may declare that in *this* language it will denote
something else; after all, it is up to that spec to define the
meanings - denotations - of its own expressions, surely; particularly
if it is a formal notation with no associated 'social' conventions to
predetermine meanings. Apart from the obvious problems of getting
software agents to read specs written in English, this seems to be a
basic fault line in the existing conventions, and problems are
avoided only by people agreeing to try to dance in step with one
another, as it were. Just talking about "identifies" as a kind of
fuzzily defined relation between URIs and resources doesn't resolve
this issue. Appealing to network exchange protocols is irrelevant
when we are discussing denotations; and social conventions are
irrelevant when we have a potential mismatch between formal
specifications. What we seem to need, in fact, is precisely
something analogous to 'social conventions' for URIs used in a formal
web context where English, pictorial and other existing conventions
do not apply; and preferably, in a form that can be read by, or
incorporated into the code of, software agents.
I have phrased this in deliberately provocative language for
emphasis, but these kind of issues already arise, if only in small
ways. For one example, some of the XSD datatypes don't measure up to
the requirements of an RDF datatype, so the relevant URIrefs don't
denote datatypes in RDF, no matter what the XML schema spec says
about them. Nothing to do with social conventions or network
protocols, note. Again, many uses of RDFS in fact impose more
meaning on the rdfs: vocabulary than RDFS itself does, so that the
referent of, say, rdfs:range (the property which relates a property
to a class restriction on its value) is different when that URI
reference is used in OWL from its meaning in RDFS; in fact, that
particular URIref has at least three meanings, depending on whether
it is used in RDFS, OWL-DL and OWL-Full. There seems to be no way
that this can be prevented from happening, even if one wished to
prevent it. None of these examples are fatal, but they do at the
least put some severe strain on your account of the relationship
between URIs and resources. None of this is particular to RDF, by the
way: these kind of things will inevitably happen in any system of
notations with formally defined meaning conditions.
If resources can be anything (with an identity), and if URIs really
did identify things incontrovertibly or by technically specifiable
means, then this situation wouldn't arise; but it arises all the
time. For example, where *exactly* is it specified that
http://www.w3.org/2001/XMLSchema#string denotes the particular
datatype described at http://www.w3.org/TR/xmlschema-2/#string ? Is
that covered by what one reads at http://www.w3.org/2001/XMLSchema ?
Or is implicit in what is said at
http://www.w3.org/TR/xmlschema-2/#namespaces ? What conventions,
social or otherwise, decide questions like this?
>A few principles apply:
>- While the definitions of URI and Resource are somewhat circular,
>the existence of a URI does not imply the existence of a resource.
>For example, the URI http://example.com/386751531 identifies no
>resource.
True.
>- Formally, resources could exist without URIs - for example, there
>is a picture of my cat somewhere on http://www.tbray.org but I'm not
>publishing a URI. However, such resources have no practical import
>or utility.
No, they have enormous practical import and utility. For example
consider a web service offering books for sale, using markup in an
ontology language which supports simple class reasoning. An order is
received for three copies of a certain book, and is accepted, and
payment is made. This entire transaction may be done without any URIs
being assigned to the particular copies of the book, but the
reasoners are able to establish that three books exist (eg by
reasoning about the number of things in the class of books attached
to the order) and to draw conclusions about them, eg that they weigh
enough to require a certain packaging method. The weight itself may
have a URI assigned to it, as may the order, but there is no reason
why the books must have; but surely the books exist, and indeed this
conclusion can be reached by software. Similar examples can be given
in almost any web-services scenario; or consider a web-accessible
database of employees which uses a non-URI format for representing
employees. So certainly things can exist which have no URI assigned
to them, but can be referred to from Web pages, and be important to
Web software.
One could argue that they should not be considered to be *resources*
until they are identified-1 by a URI; but that decision, while
coherent, seems to me to be arbitrary and unjustified, to yield no
tangible benefits and likely to cause enormous trouble. It would
require reasoners to distinguish things like the books in the example
from things that do have URIs, and probably to keep track of the
times at which URIs got assigned to the existing but not-yet-resource
thingies that became resources when they were baptized with a URI,
and so forth; and as far as I can see, all to no purpose.
>- URI schemes may impose constraints on the types of resource they
>identify; for example, ftp: URIs identify files and directories
>accessible using the FTP protocol.
In the first sense of "identify", yes.
>- Ambiguity in the characterization of what resource a URI
>identifies is always undesirable and reduces the utility of both the
>resource and the URI.
Again, this is not necessarily true for the second sense of
"identify". In fact, in the case of formal assertional languages, it
is *inevitable* that there will be URIs which are ambiguous in some
sense, since the semantic conditions of a formal model theory only
impose necessary conditions on meanings, rather then specifying
absolute, fixed, referents. For example, it is meaningless to ask
what resource is 'identified' by, say, rdfs:range. The semantics of
RDFS imposes restrictions on what this can refer to, but its exact
referent will vary between alternative formal interpretations. There
is no single 'resource' for it to denote. Apart from this rather
fundamental point, there is practical utility in for example allowing
an ambiguously referring symbol to be traded between software agents
whose primary function is to disambiguate it in a suitable context;
such negotiations can be used to decide permissions inside agent
domains, for example. In another example, a URI might be generated
to denote an entity known to exist which satisfies a query, but there
may be no way for either the querier or the answerer to unambiguously
determine the referent. Nevertheless, such URIs may be of great
utility in the querying process by allowing further queries to be
made; and it may come about that as more information is obtained, the
referent can be determined later.
Pat Hayes
PS.
Summary. The normal Web protocols are being used in the semantic web
to transmit information which itself uses URI references to refer to
things, *according to formal conventions set by W3C specs* (This is
the new part: its not just English any more.). So URI references have
two rather precise jobs to do instead of just one; they are needed to
hyperlink the information together - to bind together descriptions,
just as they bind together the HTML - but also they are used to refer
to the things being *described by* the hyperlinked text. Words like
"identify" can be understood to refer to either or both of these
uses, and so are inherently ambiguous for this reason; and moreover
the properties of these two senses of "identify" are very different,
so the spec needs to be clearer about which sense is intended; and if
both are intended, it needs to spell out, or at least indicate
roughly, the relationships expected to hold between them.
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32501 (850)291 0667 cell
phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu for spam
Received on Wednesday, 23 April 2003 01:09:19 UTC