- From: Sandro Hawke <sandro@w3.org>
- Date: Sat, 08 Feb 2003 13:42:06 -0500
- To: www-archive@w3.org
Have I got this right yet?
In the view of RFC 2396 and RFC 2616, each URI directly identifies
one thing, its identified "resource". These specifications keep
the definition of "resource" and the nature of the identification
relationship very abstract in order to keep the field open for
unforseeable future protocols and applications. While it may have
been tempting to define URIs more narrowly, saying perhaps that
they directly point to living documents, such an approach might
have prohibited novel applications such as those involving
streaming media, mobile code, cookies, and web services. So an
abstract definition was used and the web has kept evolving.
Unfortunately, the abstractness in the definition of "resource" has
led some people to think it was reasonable to identify people,
products, organizations, physical objects, etc, with http URIs.
RFC 2396 makes it clear that URIs in general can be used like this,
but RFC 2616 and the HTTP protocol are not meant to be used this
way. HTTP URIs are intended for use with the HTTP protocol, which
is a particular data transfer protocol. The reason to use an HTTP
URI is that it can be used with the HTTP protocol.
It is tempting to say that an HTTP URI like
"http://www.w3.org/People/EM" can identify a person. In a loose,
natural language sense, this string does identify Eric Miller.
Similarly, in a loose natural language sense, the MIME entity
returned in a successful HTTP GET transaction "represents" Eric
Miller. It has his picture, and Merriam-Webster's first definition of
"representation" is "an artistic likeness or image". But these
meanings of "identify" and "respresent" are not in the technical
sense meant by the HTTP specifications.
The temptation is strong, because if you identify Eric with a URI
like "http://www.w3.org/People/EM", you can easily use HTTP to get
information about him. By the same token, if you identify the RDF
type property with a URI like
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type", you can easily
use HTTP to get ontological information about it.
Unfortunately, this use of URIs is not in keeping with existing web
technologies. Everything from bookmark editors and search engines
to human institutions like ad agencies treat HTTP URIs as
identifying something like a source of information, a virtual place
to visit, or something you can talk to. These uses of URIs are at
the heart of HTTP -- we wanted to use HTTP URIs for identifying
Eric Miller and the RDF type property so we could easily obtain
some information about them! -- and they are fundamentally different
from the use of URIs to identify things like physical objects.
The dictionary notion of representation might tempt people to argue
that even if "http://www.w3.org/People/EM" does not identify Eric
Miller, at least "http://www.w3.org/People/EM/e_miller.jpg" does.
Clearly, that URI leads to a picture which represents him. But
this approach breaks down when you consider that HTTP GET of
"http://www.w3.org/Team/EM/s000782" also gives you a picture of
him. They both represent Eric, but they are different pictures.
They show him in different poses, and (as you may have noticed)
they are published on the web differently; access to the later one
is tightly controlled. If the resource identified by each of these
URIs was Eric Miller, we could not use the URIs as identifiers to
talk about the differences here. The correct view is that the URIs
identify systems which offer a pattern of HTTP responses; the
second resource uses access control while the first does not.
Those systems and those responses bear an interesting relationship
to Eric Miller, but they are themselves things worthy of discussion
and so of being identified.
TimBL has argued that this view is correct for non-fragment URIs,
but that URIs with a fragment part are different. I disagree,
because I think HTTP fragment URIs, even though unused in HTTP
itself, still identify information sources. We often footnote
discussions with fragement-URIs to say, in effect, my point is
supported by _this_ _part_ of some document. That part of the
document is a source of information, like a whole document, which
may be bookmarked, linked-to, indexed, and even used in
advertisements.
So how can we identify Eric Miller and still have ready access to
his web page? The answer is that when we use a URI as a name for
something, we should be clear whether we mean it to operate
directly as a web address or indirectly as (in topic maps
terminology) a subject indicator. When a URI appears as an xmlns
value, an HTML profile identifier, an HTTP extensions identifier
(RFC 2774), or as an RDF predicate, it clearly is operating as a
subject indicator. We know this because, among other things, the
application operates normally even when HTTP access to the resource
is impossible. Another sign is that implementations compare such
URIs on a character-by-character basis, not even folding case in
the scheme name.
In fact, about the only time this dual use of URIs as web addresses
and subject indicators is even noticable is in RDF node labels.
When we have RDF triples like this example in the RDF Primer [1]
<http://www.example.org/index.html>
<http://purl.org/dc/elements/1.1/creator>
<http://www.example.org/staffid/85740> .
we can guess the first URI is being used as a web address while
the third is being used as a subject indicator, but such a
determination is not always possible. I suspect, given its PICS
heritage, that in early uses of RDF the node labels were always
intended in as web address -- this was information about the
relationships between web pages -- but I don't know. Somewhere
along the line, people got tempted by the wording of RFC 2396 and
their own desire to make a more useful system, and started they
started to lose the distinction.
It has been suggested that type inference can serve to
disambiguated triples like the one above. If the range of
dc:creator were Person, then we would know
"http://www.example.org/staffid/85740" was being used as a subject
indicator. That might work, sometimes. But type inference cannot
always help. Imagine a work of art, a sculpture with a URL
engraved in its base. At that address, the sculptor maintains a
website about the work. If that URL is
"http://www.example.org/index.html", then does the above triple
tell us about the creator of the sculpture or the creator of the
website? If we defined dc:websiteCreator, which could only be used
to tell us about the creator of a website, we would be in the same
mess if we came across a website about a website. In some cases,
no amount of information about a URI can tell us whether, in a
given occurance in RDF triple, it is meant to be used as a subject
indicator or a web address.
RDF documentation should be clear: it should use the word
"resource" only when talking about the thing immediately identified
in an RDF 2396/2616 sense. A physical object cannot be an HTTP
resource, in this sense. It could be a resource using some other
URI scheme/URN NID, like urn:oid, urn:uuid, or tag:.
In terms of the actualy syntax and semantics, some solutions for
RDF include:
- In the abstract syntax, say that URIs label nodes in one of two
ways (web address and subject indicator); in the concrete
syntax imagine rdf:about and rdf:resource being combined into
one linking attribute, but then split that into rdf:webAddress
and rdf:subjectIndicator. (Or perhaps something like
aboutWebPage, aboutIndicatedSubject, identifiedResource, and
indicatedSubject. The names will take a little work.)
- Alternatively, deprecate URI node labels in one or both modes
of identification, while introducing RDF properties webAddress
and subjectIndicator.
- A third option is my http://www.w3.org/2002/12/rdf-identifiers
proposal where *in* *RDF* the "#" is seen as a flag indicating
which style of URI use is intended. This is a
backwards-compatibility hack to avoid needing to change or
deprecate current uses.
[1] http://www.w3.org/TR/rdf-primer/#rdfmodel
Received on Saturday, 8 February 2003 13:43:50 UTC