- From: Sandro Hawke <sandro@w3.org>
- Date: Sat, 08 Feb 2003 13:42:06 -0500
- To: www-archive@w3.org
Have I got this right yet? In the view of RFC 2396 and RFC 2616, each URI directly identifies one thing, its identified "resource". These specifications keep the definition of "resource" and the nature of the identification relationship very abstract in order to keep the field open for unforseeable future protocols and applications. While it may have been tempting to define URIs more narrowly, saying perhaps that they directly point to living documents, such an approach might have prohibited novel applications such as those involving streaming media, mobile code, cookies, and web services. So an abstract definition was used and the web has kept evolving. Unfortunately, the abstractness in the definition of "resource" has led some people to think it was reasonable to identify people, products, organizations, physical objects, etc, with http URIs. RFC 2396 makes it clear that URIs in general can be used like this, but RFC 2616 and the HTTP protocol are not meant to be used this way. HTTP URIs are intended for use with the HTTP protocol, which is a particular data transfer protocol. The reason to use an HTTP URI is that it can be used with the HTTP protocol. It is tempting to say that an HTTP URI like "http://www.w3.org/People/EM" can identify a person. In a loose, natural language sense, this string does identify Eric Miller. Similarly, in a loose natural language sense, the MIME entity returned in a successful HTTP GET transaction "represents" Eric Miller. It has his picture, and Merriam-Webster's first definition of "representation" is "an artistic likeness or image". But these meanings of "identify" and "respresent" are not in the technical sense meant by the HTTP specifications. The temptation is strong, because if you identify Eric with a URI like "http://www.w3.org/People/EM", you can easily use HTTP to get information about him. By the same token, if you identify the RDF type property with a URI like "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", you can easily use HTTP to get ontological information about it. Unfortunately, this use of URIs is not in keeping with existing web technologies. Everything from bookmark editors and search engines to human institutions like ad agencies treat HTTP URIs as identifying something like a source of information, a virtual place to visit, or something you can talk to. These uses of URIs are at the heart of HTTP -- we wanted to use HTTP URIs for identifying Eric Miller and the RDF type property so we could easily obtain some information about them! -- and they are fundamentally different from the use of URIs to identify things like physical objects. The dictionary notion of representation might tempt people to argue that even if "http://www.w3.org/People/EM" does not identify Eric Miller, at least "http://www.w3.org/People/EM/e_miller.jpg" does. Clearly, that URI leads to a picture which represents him. But this approach breaks down when you consider that HTTP GET of "http://www.w3.org/Team/EM/s000782" also gives you a picture of him. They both represent Eric, but they are different pictures. They show him in different poses, and (as you may have noticed) they are published on the web differently; access to the later one is tightly controlled. If the resource identified by each of these URIs was Eric Miller, we could not use the URIs as identifiers to talk about the differences here. The correct view is that the URIs identify systems which offer a pattern of HTTP responses; the second resource uses access control while the first does not. Those systems and those responses bear an interesting relationship to Eric Miller, but they are themselves things worthy of discussion and so of being identified. TimBL has argued that this view is correct for non-fragment URIs, but that URIs with a fragment part are different. I disagree, because I think HTTP fragment URIs, even though unused in HTTP itself, still identify information sources. We often footnote discussions with fragement-URIs to say, in effect, my point is supported by _this_ _part_ of some document. That part of the document is a source of information, like a whole document, which may be bookmarked, linked-to, indexed, and even used in advertisements. So how can we identify Eric Miller and still have ready access to his web page? The answer is that when we use a URI as a name for something, we should be clear whether we mean it to operate directly as a web address or indirectly as (in topic maps terminology) a subject indicator. When a URI appears as an xmlns value, an HTML profile identifier, an HTTP extensions identifier (RFC 2774), or as an RDF predicate, it clearly is operating as a subject indicator. We know this because, among other things, the application operates normally even when HTTP access to the resource is impossible. Another sign is that implementations compare such URIs on a character-by-character basis, not even folding case in the scheme name. In fact, about the only time this dual use of URIs as web addresses and subject indicators is even noticable is in RDF node labels. When we have RDF triples like this example in the RDF Primer [1] <http://www.example.org/index.html> <http://purl.org/dc/elements/1.1/creator> <http://www.example.org/staffid/85740> . we can guess the first URI is being used as a web address while the third is being used as a subject indicator, but such a determination is not always possible. I suspect, given its PICS heritage, that in early uses of RDF the node labels were always intended in as web address -- this was information about the relationships between web pages -- but I don't know. Somewhere along the line, people got tempted by the wording of RFC 2396 and their own desire to make a more useful system, and started they started to lose the distinction. It has been suggested that type inference can serve to disambiguated triples like the one above. If the range of dc:creator were Person, then we would know "http://www.example.org/staffid/85740" was being used as a subject indicator. That might work, sometimes. But type inference cannot always help. Imagine a work of art, a sculpture with a URL engraved in its base. At that address, the sculptor maintains a website about the work. If that URL is "http://www.example.org/index.html", then does the above triple tell us about the creator of the sculpture or the creator of the website? If we defined dc:websiteCreator, which could only be used to tell us about the creator of a website, we would be in the same mess if we came across a website about a website. In some cases, no amount of information about a URI can tell us whether, in a given occurance in RDF triple, it is meant to be used as a subject indicator or a web address. RDF documentation should be clear: it should use the word "resource" only when talking about the thing immediately identified in an RDF 2396/2616 sense. A physical object cannot be an HTTP resource, in this sense. It could be a resource using some other URI scheme/URN NID, like urn:oid, urn:uuid, or tag:. In terms of the actualy syntax and semantics, some solutions for RDF include: - In the abstract syntax, say that URIs label nodes in one of two ways (web address and subject indicator); in the concrete syntax imagine rdf:about and rdf:resource being combined into one linking attribute, but then split that into rdf:webAddress and rdf:subjectIndicator. (Or perhaps something like aboutWebPage, aboutIndicatedSubject, identifiedResource, and indicatedSubject. The names will take a little work.) - Alternatively, deprecate URI node labels in one or both modes of identification, while introducing RDF properties webAddress and subjectIndicator. - A third option is my http://www.w3.org/2002/12/rdf-identifiers proposal where *in* *RDF* the "#" is seen as a flag indicating which style of URI use is intended. This is a backwards-compatibility hack to avoid needing to change or deprecate current uses. [1] http://www.w3.org/TR/rdf-primer/#rdfmodel
Received on Saturday, 8 February 2003 13:43:50 UTC