- From: pat hayes <phayes@ai.uwf.edu>
- Date: Wed, 23 Apr 2003 00:09:14 -0500
- To: Tim Bray <tbray@textuality.com>
- Cc: "Roy T. Fielding" <fielding@apache.org>, uri@w3c.org
- Message-Id: <p05111b0bbacb5dc46191@[10.0.100.12]>
>Roy T. Fielding wrote: > >>If you have suggested wording to change, then please suggest it. >>If you don't, then this is a redundant discussion and I have already >>answered it before: > >I have a suggested wording change, because while I have been largely >unimpressed by the philosophical jargon being thrown around here >recently It is sad when a carefully worded request for clarification can be dismissed as "philosophical jargon", but let it pass. >, I do agree that the current definition "A resource can be anything >that has identity" offers significant room for improvement; among >other things it deserves to be called out and not sequestered in a ><dd>. > >Here you go: > ><h3>Resources and URIs</h3> > >Many different abstract, informational, and physical things may be >resources. URIs exist to identify resources, but this "identity" "Identity"/"identify" ? >relationship has both social and technical dimensions. > >For example, it is incontrovertible that the URI >http://www.tbray.org/A0.png identifies a resource which is a >particular bitmapped graphic (I assert this, I control tbray.org, >and the assertion is verifiable via technical means) Sorry, that is not incontrovertible, because you have not said what you mean by "identifies as a resource". (I know this must be infuriating, but please bear with me, as this gets to the heart of the matter.) In one sense, that phrase refers to a process involving HTTP protocols and the resulting document (or maybe a suitable abstraction of it: I know there are some delicate issues here) is the thing identified-1; and then what you say is obviously correct. In a different sense, however, a certain string of the form "xxx-xx-xxxx" identifies-2 me because it is my SS number; but there is absolutely no implication that anyone can use that string to access me, and no *technical* means to verify the claim. It is just that according to certain socially accepted rules, a description like 'the person with SS number xxx-xx-xxxx' in fact refers to - denotes - me. And in that second sense of "identify", what you say is just plain wrong. Any control you have is irrelevant, since what the URI "identifies" depends on whatever is said using the URI together with the conventions and specifications which govern reference of expressions *of the language used to say it*, just as 'xxx-xx-xxxx' becomes meaningful when used in a certain kind of English description, and what is said about the referent may be anything the user chooses to say. ("The person with SS number xxx-xx-xxxx is an argumentative pain in the ass.") Network transfer protocols are irrelevant to what is "identified" in this sense, but the meaning conventions of the 'saying' language are central, so what the URI "identifies" in this sense depends on the surrounding linguistic (or other, eg pictorial) conventions, and is therefore most definitely controvertible. My earlier request for clarification can be phrased as saying, which sense of "identify" do y'all mean? And to observe that it is not sufficient to answer 'both' or 'it doesn't matter which', because the properties of the two senses differ radically, and it does matter. In the second sense, for example, one could rationally claim that http://www.tbray.org/A0.png in fact identifies the the cardinality of the set of the natural numbers, which is what the symbol in that graphic is conventionally used to denote. URI references are being used to "identify" in both these senses on the Web. Deployed technology depends on both senses being available. RDFS and DAML+OIL and OWL all use both senses, for example; they use URIrefs with fragIds to denote (the second sense) and they use URIs in conventional ways to locate web pages with RDF, OWL, etc, on them (which themselves contain URI references which might denote... and so on), as well as other web pages and documents. And in fact they could use URIs with no fragIDs to denote, as well, so a URI actually can have *both* kinds of "identify" used on it, and they needn't identify the same thing. The semantic rules for denotations of URI references in RDF (and hence in RDFS, OWL, etc.) explicitly make no reference to the HTTP protocols which determine what the URI 'identifies' in the first sense (or indeed to any social conventions): a decision which was taken, by the way, largely because this issue - of what a URI should be understood to *denote* - was seen as a larger issue than merely one for RDF to decide, as belonging more to a group like yours. If I can make a suggestion, it might be a good idea to declare that the two senses SHOULD identify the same thing: that any use of a URI (without a fragID) as a denoting expression should always be understood to identify-2 - denote - whatever it identifies-1; if there is any such thing, at any rate. Tim B-L has argued this position (although it would seem to be at odds with W3C practice, which tends to put explanatory notes at the end of http URIs which tell you in English what they are supposed to denote, which is not usually the note itself.) This would be a kind of global constraint on the semantic conditions applied to all web formalisms which claim to have a referential semantics. But the point is that this really does *need to be said*, if its supposed to be true: it's not necessarily true, or so blindingly obvious that it would be ridiculous to deny it. Just being ambiguous about what you mean by "identify" doesn't say it. >and that the URI http://www.w3.org/1999/xhtml identifies a resource >which is a well-known markup vocabulary (established by social >convention). Surely it is established by the meaning of the English assertion found on that web page, which says: "This is an XML namespace defined in the XHTML 1.0: The Extensible HyperText Markup Language specification ". Nothing particularly social there, seems to me. > It is possible for ambiguity to enter this relationship; for >example, does http://www.w3.org/Consortium identify an organization >or a particular HTML page on its website? If that can identify an organization referred to on the web page, why cannot your first example identify the cardinality of the set of the natural numbers? But the issue cuts deeper than that, since when the stuff on that page is itself a referential language with its own rules (set by the W3C) for what gets "identified", the thing identified-2 might be anything that can be referred to according to the formal rules of the W3C spec that defines the meaning of the language used on the page identified-1 by that URI. This is what allows 'semantic' markup on one page to use terminology defined on a different page; without this, the whole scheme would collapse. In fact, I think you are using this kind of rule yourself, in your http://www.w3.org/Consortium example above. Why would nobody even think that this might identify something totally unrelated, like the color red? Because the *English text* of the web page isn't about the color red, right? It is the *English meaning of the symbols on the web page*, using nontechnical conventions that have nothing to do with transfer protocols, which identify the referent. Similarly, but using different conventions for reference, it's because the jpeg file at http://www.coginst.uwf.edu/images/people/phayes is *a picture of me* that the URI can be ambiguously understood as referring to either the graphic or to me, but not, say, to Idi Amin. It is because http://www.coginst.uwf.edu/~phayes is *my home page* that this can be ambiguously understood as referring to my home page or to me. The conventions for determining reference depend on the nature of the representations (in a sufficiently broad sense) found on the page. Pages like http://www.w3.org/1999/xhtml seem to obey a similar convention: they refer to abstractions which the document you find there *says* they refer to, using English enriched with a specialized technical W3C terminology. You might call these conventions 'social', but that seems to me to just be an escape clause; and in any case, it doesn't handle cases where there are no existing 'social' conventions to fall back on, as with formalisms designed for use by software agents on the Web. Nevertheless, the same kind of story seems to apply: the meanings of URIrefs on a page full of RDF is determined in the first instance by what the enclosing RDF asserts about them, according to the RDF semantic conditions. But now, here's the problem. If we acknowledge that denotations of URIs are determined, or even influenced, by the enclosing language, how can anyone - even the owner of the URI - specify what it denotes in all other languages, including other formal languages? Suppose one WG assigns a new URI to some entity and publishes a document, using the language of RFC 2119, that the URI MUST denote that thing; even so, without some global conventions on denotations of URIs, another language's spec may declare that in *this* language it will denote something else; after all, it is up to that spec to define the meanings - denotations - of its own expressions, surely; particularly if it is a formal notation with no associated 'social' conventions to predetermine meanings. Apart from the obvious problems of getting software agents to read specs written in English, this seems to be a basic fault line in the existing conventions, and problems are avoided only by people agreeing to try to dance in step with one another, as it were. Just talking about "identifies" as a kind of fuzzily defined relation between URIs and resources doesn't resolve this issue. Appealing to network exchange protocols is irrelevant when we are discussing denotations; and social conventions are irrelevant when we have a potential mismatch between formal specifications. What we seem to need, in fact, is precisely something analogous to 'social conventions' for URIs used in a formal web context where English, pictorial and other existing conventions do not apply; and preferably, in a form that can be read by, or incorporated into the code of, software agents. I have phrased this in deliberately provocative language for emphasis, but these kind of issues already arise, if only in small ways. For one example, some of the XSD datatypes don't measure up to the requirements of an RDF datatype, so the relevant URIrefs don't denote datatypes in RDF, no matter what the XML schema spec says about them. Nothing to do with social conventions or network protocols, note. Again, many uses of RDFS in fact impose more meaning on the rdfs: vocabulary than RDFS itself does, so that the referent of, say, rdfs:range (the property which relates a property to a class restriction on its value) is different when that URI reference is used in OWL from its meaning in RDFS; in fact, that particular URIref has at least three meanings, depending on whether it is used in RDFS, OWL-DL and OWL-Full. There seems to be no way that this can be prevented from happening, even if one wished to prevent it. None of these examples are fatal, but they do at the least put some severe strain on your account of the relationship between URIs and resources. None of this is particular to RDF, by the way: these kind of things will inevitably happen in any system of notations with formally defined meaning conditions. If resources can be anything (with an identity), and if URIs really did identify things incontrovertibly or by technically specifiable means, then this situation wouldn't arise; but it arises all the time. For example, where *exactly* is it specified that http://www.w3.org/2001/XMLSchema#string denotes the particular datatype described at http://www.w3.org/TR/xmlschema-2/#string ? Is that covered by what one reads at http://www.w3.org/2001/XMLSchema ? Or is implicit in what is said at http://www.w3.org/TR/xmlschema-2/#namespaces ? What conventions, social or otherwise, decide questions like this? >A few principles apply: >- While the definitions of URI and Resource are somewhat circular, >the existence of a URI does not imply the existence of a resource. >For example, the URI http://example.com/386751531 identifies no >resource. True. >- Formally, resources could exist without URIs - for example, there >is a picture of my cat somewhere on http://www.tbray.org but I'm not >publishing a URI. However, such resources have no practical import >or utility. No, they have enormous practical import and utility. For example consider a web service offering books for sale, using markup in an ontology language which supports simple class reasoning. An order is received for three copies of a certain book, and is accepted, and payment is made. This entire transaction may be done without any URIs being assigned to the particular copies of the book, but the reasoners are able to establish that three books exist (eg by reasoning about the number of things in the class of books attached to the order) and to draw conclusions about them, eg that they weigh enough to require a certain packaging method. The weight itself may have a URI assigned to it, as may the order, but there is no reason why the books must have; but surely the books exist, and indeed this conclusion can be reached by software. Similar examples can be given in almost any web-services scenario; or consider a web-accessible database of employees which uses a non-URI format for representing employees. So certainly things can exist which have no URI assigned to them, but can be referred to from Web pages, and be important to Web software. One could argue that they should not be considered to be *resources* until they are identified-1 by a URI; but that decision, while coherent, seems to me to be arbitrary and unjustified, to yield no tangible benefits and likely to cause enormous trouble. It would require reasoners to distinguish things like the books in the example from things that do have URIs, and probably to keep track of the times at which URIs got assigned to the existing but not-yet-resource thingies that became resources when they were baptized with a URI, and so forth; and as far as I can see, all to no purpose. >- URI schemes may impose constraints on the types of resource they >identify; for example, ftp: URIs identify files and directories >accessible using the FTP protocol. In the first sense of "identify", yes. >- Ambiguity in the characterization of what resource a URI >identifies is always undesirable and reduces the utility of both the >resource and the URI. Again, this is not necessarily true for the second sense of "identify". In fact, in the case of formal assertional languages, it is *inevitable* that there will be URIs which are ambiguous in some sense, since the semantic conditions of a formal model theory only impose necessary conditions on meanings, rather then specifying absolute, fixed, referents. For example, it is meaningless to ask what resource is 'identified' by, say, rdfs:range. The semantics of RDFS imposes restrictions on what this can refer to, but its exact referent will vary between alternative formal interpretations. There is no single 'resource' for it to denote. Apart from this rather fundamental point, there is practical utility in for example allowing an ambiguously referring symbol to be traded between software agents whose primary function is to disambiguate it in a suitable context; such negotiations can be used to decide permissions inside agent domains, for example. In another example, a URI might be generated to denote an entity known to exist which satisfies a query, but there may be no way for either the querier or the answerer to unambiguously determine the referent. Nevertheless, such URIs may be of great utility in the querying process by allowing further queries to be made; and it may come about that as more information is obtained, the referent can be determined later. Pat Hayes PS. Summary. The normal Web protocols are being used in the semantic web to transmit information which itself uses URI references to refer to things, *according to formal conventions set by W3C specs* (This is the new part: its not just English any more.). So URI references have two rather precise jobs to do instead of just one; they are needed to hyperlink the information together - to bind together descriptions, just as they bind together the HTML - but also they are used to refer to the things being *described by* the hyperlinked text. Words like "identify" can be understood to refer to either or both of these uses, and so are inherently ambiguous for this reason; and moreover the properties of these two senses of "identify" are very different, so the spec needs to be clearer about which sense is intended; and if both are intended, it needs to spell out, or at least indicate roughly, the relationships expected to hold between them. -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes s.pam@ai.uwf.edu for spam
Received on Wednesday, 23 April 2003 01:09:19 UTC