URIs as names-for-reference vs locations-for-access

Ah, a title might be courteous....

Again, there seems to be the usual questions about the SemWeb popping up,
and in particular http-range-14. There also doesn't seem to be much progress on 
these issues. Here's some notes that I think may be helpful,
which basically try to distinguish between URIs as names for locations versus 
URIs as locations for physical access, as well as try to define the elusive 
term "on the Web" as being something that if the Web was destroyed, would also 
be destroyed. Also I distinguish between the use of representation in REST 
versus representation in AI/philosophy, which are not always the same. I think 
these distinctions, and taking them seriously, is clearly very important to 
http-range-14.

The full text is here, and benefited from some discussion with Pat Hayes:

http://www.ibiblio.org/hhalpin/homepage/notes/uri.html

Text version below:
-----------------------------------------------------------------------
URIs as Names for Reference and as Locations for Access
httpRange-14 notes
By Harry Halpin
Thanks to Pat Hayes for some examples and commentary, although any errors are 
due to me of course!


What do URIs identify?

In essence, one reason Web works because using a web protocol like 
http(Hypertext Transfer Protocol), one can from a client send a request to a 
server to do an operation such as HTTP GET for a given URI and dereference 
something, often a web-page. However, this very basic feature of the Web is 
bedeviled by a question: "What is the range of the HTTP dereference function?" 
In other words, what do URIs identify? In theory this question has been solved 
by the W3C TAG's AWWW: URIs refer to anything. Upon inspection, the official 
definition is actually circular: "We do not limit the scope of what might be a 
resource...it is used in a general sense for whatever might be identified by a 
URI." The question then arises that if a resource is just anything that could 
theoretically be with a identified URI, is there anything that can not be 
identified? It would seem not. This view is given by the AWWW as "our use of 
the term resource is intentionally more broad. Other things, such as cars and 
dogs ... are resources too." However, referring to a web-page and the car in my 
garage are similar, but not exactly the same. The essential difference is this: 
in the first case on the Web we have physical, connected, access to the 
Web-page, while in the second case if we are using Semantic Web logic to refer 
to my car, we only the ability to refer to my car by a URI name, and this has 
no direct, connected, or physical access. When one uses a URI as a name there 
is a disconnect, as the thing named may not be on the Web.

The division between representation and resource existed but was not explicitly 
stated, and definitely not noticed by, most of the users of the original 
hypertext Web. URLs seem to be originally meant to identify the location of 
representations, such as HTML web-pages, or possibly sets of representations, 
such when through content negotiation a news website figures out where you live 
and then serves you your local news. With the advent of the Semantic Web, the 
problem of httpRange-14 comes up precisely because a URI can be used to refer 
to anything, not just web pages. To be more precise, the issue comes up because 
URIs can refer to things that are not "on the Web" and so do not necessarily 
have a Web-accessible representation. Despite of this, these things that are 
"not on the Web" are fundamentally "on the Web" in another sense, since they 
can be reasoned about by the Semantic Web. The crucial point is what does "on 
the Web" mean? To answer that question we must pursue the historical chain of 
events from URL to URN to URI.

Locations

Uniform Resource Locations (URL) did not suffer from the httpRange-14 issue, 
unlike their nearly identical brethren URIs. Unlike URIs, URLs identified a 
specific type of thing: a location, which is a physical place. This location 
was assumed to be on the Web. By "on the Web," something that is physically 
connected to the Web. A URL denotes a location on some web-server which serves 
representations (HTML document, music file to download, whatever) to visiting 
web clients. A location can be connected to the Web because it - even after 
endless redirection - in a physical place.

Take a mundane example: my address. An address is a just a location that has a 
thing that can (usually) be found at that location, and there exists a 
specified system for finding the location of an address. This allows multiple 
locations to be ordered in a way that humans, such as in street addresses (or 
machines in the case of IP addresses) can navigate easily. In the case of my 
address, and if one wants to find me, they can try to looks for at the location 
of my address - and I'm sometimes not there, so my address can give the person 
trying to find me a metaphysical 404 error. A location can, and should, give 
you direct, connected, physical access to the thing at the location. URLs are 
used as names of locations, and sending at HTTP GET (or POST, or HEAD, and so 
on) to a server requires the server if possible to go to the location and 
physically access the thing at the location, usually by copying it and sending 
a copy to your computer. Or sending a very real 404 error.

On the Web

Something could be found on the Web if it physically and causally connected to 
the Web. This means that whatever it was "on the Web," it could be encoded into 
bits and transferred over the Web. However, this is only "on the Web" the Web 
in the strongest sense: as in always on the Web. A thing can be only on the Web 
sometimes, or only partially on the Web, or only rarely on the Web. By our 
definition, if it could not be removed from the Web without loss of its 
functionality. One can imagine a whole range of possibilities, from being 
"strongly" on the Web (all the time) to "weakly" on the Web (occasionally). 
Thus, both documents and servers are "on the Web", and humans are not "on the 
Web" in a weak sense since they only interacted directly with the Web 
indirectly through typing on keyboards. Things like the Eiffel Tower or Louis 
XVI are definitely "not on the Web" on the Web, since Louis XVI is long gone 
and cannot at any point directly connect physically to the Web, while the 
Eiffel Tower is only represented on the Web, but no physically sending any 
bytes to anyone itself. The Eiffel tower is composed not of bytes, but of 
steel. This brings us to "representations" on the Web. What is the difference 
between something merely having a representation on the Web and something being 
fully on the Web? Rephrasing Brian Smith: Some thing is on the Web such that if 
the Web itself was destroyed, that thing would also be destroyed. If not, it's 
not fully on the Web. If someone destroyed the Web, this would not damage me if 
I were being denoted by a URI, but my homepage at that URI would be up in smoke 
if that what's people were using to refer to me by. I am not on the Web in a 
strong sense, but my homepage sure is. There are lots of middling cases: my 
computer is weakly on the Web, more so than myself. If my httpd daemon went 
down and my computer could no longer access the Web, or the Web itself 
collapsed, the computer qua computer still exists, but the computer qua Web 
server went up in smoke with the rest of the Web. One good question yet to be 
answered when are humans on the Web in a strong sense? Would it require our 
credit card details to be in an chip beneath our skin with a URI, and wireless 
internet monitoring us with a GPS that sent messages over the Internet? Those 
examples seem also too simplistic and extreme. Still, what is the difference 
between a something being represented on the Web and being on the Web? One 
necessary but not nearly sufficient condition for "representation" would be 
that a thing X represents another thing Y if you can destroy thing X and thing 
Y remains unscathed. Representations qua representations are on the Web, and 
would be destroyed if the Web was destroyed. However, what they represent would 
not be destroyed, unless what the representation represented also was on the 
Web.

Representations: REST and AI

Before going any further, we have to distinguish two different uses of the word 
"representation." The first is the use of "representation" as it is used 
artificial intelligence, cognitive science, and philosophy. In this use, a 
representation is something that "denotes" or "is about" something else, 
although often additional requirements are put on exactly what type of things 
the representation or its denotation may be. This will be called 
"representationAI." The second use is the use of "representation" as used by 
REST (The Representational State Transfer web architecture theory of Roy 
Fielding), where a representation can be whatever that a URI returns from a 
HTTP request. This will be called a "representationREST". A representationREST, 
unlike a representationAI, does not necessarily refer to or denote any other 
thing - although it might! The two definitions are not the same, but not 
mutually exclusive either. So, the difference between "on the Web" and "not on 
the Web" is also a test of both types of representation. A representationAI can 
qua representationAI be entirely on the Web if what it represents is also on 
the Web. Lots of representations, such an analog photo on my desk, are not on 
the Web at all. In another case, a picture of me on the Web is on the Web qua 
itself but not on the Web qua me, because it denotes me, not something on the 
Web. If the Web was destroyed, it would only destroy the bytes of the 
representationAI, not necessarily what the representation denoted. Also, 
representationsAI may have layers of representationAI, as one representation 
may denote other representationsAI, leading to all sorts of interesting chains 
of reference. However, representationsREST are by definition on the Web, and 
would be destroyed if the Web was destroyed, at least as the possible objects 
of HTTP operations. This is because representationsREST are defined precisely 
as the bytes that are sent over the Web. One could argue that copies of them 
archived to a computer might survive. However, those copies would no longer be 
representationsREST qua the Web, but just whatever they are without the Web 
being involved. This argument does reveal that both sorts of representation are 
functional categories that are dependent on their context, as something is 
never a representationREST without being on the Web (or in some parallel 
universe, another system that implements REST). Something is never a 
representationAI without something being represented.

Virtual Locations and Digitality

This idea of physically being on the Web can be abstracted from the concept of 
location. "Being on the Web" does not mean a thing has one URL or even physical 
location. Something could be on the Web and have multiple URLs, are multiple 
copies in different physical locations. A location can be a virtual location, 
an abstraction over a set of possible physical representations, as long as it 
really is a location. What exactly is the "thing" at a URL location? It's not 
just a particular server, nor is it some abstract resource. It is actually some 
bytes, a representationREST or set of representationsREST, which one has to 
actually GET to determine using your web client to see if it's a 
representationAI. The particular server where the actual representationREST 
lives is actually denoted by another type of location: wherever it is on the 
server, and the server has a very concrete IP address. A URL can be a name that 
denotes a virtual location, which is the forwarded to the place where the 
concrete bits are stored. These bits are usually on a server somewhere. When 
one accesses http://www.w3c.org, if I am in Japan I get the mirror of the W3C 
web-pages in Japan, if I'm in the US I get the one hosted at MIT, but I get the 
same "resource," regardless. Here the concept of resource as stated by TAG 
starts making some sense. It's a concept about the contents of a 
representationREST. However, this resource is not identical to the thing 
physically received as bytes (that's the representationREST). A resource seems 
to be the abstract idea of the common information between all the possible 
representationsREST returned. To properly understand resource then one needs a 
thorough inspection of theories of information and content, which is beyond the 
scope of this little note. Still, what is physically returned by a HTTP GET is 
just the representationREST, which may differ between MIT and Kyoto, while it 
might not between INRIA and MIT. The fact that the Web is digital becomes 
crucially important: the "copyability" of the representationsREST, due to their 
digital nature, is crucial to why the Web works, just as crucial as a universal 
naming scheme. Yet, things not "on the Web" (Pat Hayes qua Pat Hayes, my dog, 
etc) don't have this property of copyability. A picture on the Web of Pat Hayes 
is digital, but Pat Hayes is not, no matter how much time he spends online.

What's in a Name?

A name is entirely different from a location. Unlike a location, a name does 
not necessarily give you access to the thing named, and this thing name we will 
call the referent of the name. The set of all referents of a name (or 
denotations of a representation for that matter) we will call its 
interpretation. In fact, names are usually used when connected, physical access 
is impossible, and as such are place-holders for the physical thing precisely 
because there is no physical access. This concept of "names" is more in line 
with the URN effort, which essentially tries to serve as rigid designators in 
the Kripkean sense for the Web. Since a name does not have any connection to a 
referent, putting a name on the Web via a URI (such as a URN) does absolutely 
nothing at all to the referent of the name. When anyone accesses the resource 
"Pat Hayes" from URI ,http://www.ihmc.us/users/phayes/PatHayes.html, Pat Hayes 
does magically appear next to them. What that URI currently can return from a 
HTTP get is a representationREST: a Web-page in HTML encoded as very physical 
bytes somewhere that get sent to me over a wire as very physical bytes, and 
then displaying by a very physical computer the social security number of Pat 
Hayes and other defining details. It could even theoretically return a 
definition of Pat Hayes in RDF. Yet this particular URI representationREST also 
serves double-duty as a representationAI, since it contains pictures of the 
actual Pat Hayes, relevant facts about him, and so on. Pat Hayes himself is not 
on the Web, since if the Web is destroyed Pat Hayes would merrily go along, and 
probably with more spare time.

So, the use of a URI as a "name" causes a URI to be used as a representationAI. 
However, what exactly the interpretation of a URI as a "name" actually is goes 
beyond the physics of transferring bytes. This interpretation is either the 
yet-to-come metaphysics of the Semantic Web, social meaning, or something else 
- who knows? But what is important is that it is a non-physical, non-causal, 
non-connected relationship, unlike the relationship of a location which is a 
physical, connected, causal relationship. Note that URIs used as 
names-for-reference are common in the Semantic Web, and the Semantic Web 
depends on there being names with interpretations to reason over. Because there 
is no direct access to the thing the URI-as-name identifies, unlike the use of 
a URI-as-location, the Semantic Web uses URIs without any necessary use of 
representationsREST. A URI in the Semantic Web is used more like as 
"place-holders" or even (stretching it a bit) "keys," without any HTTP 
operation returning any bytes from a server in terms of representationREST. 
Thus, the Semantic Web uses URIs as representationsAI, while the Good-Old 
HyperText Web uses URIs as representationsREST.

Double Lives as Names and Locations

The key of the confusion is that http fundamentally will dereference whatever a 
URI refers to, and there are two distinct types of functional roles a URI can 
play: name and location. A URI can serves as a identifier-as-a-name, which is a 
non-physical relation of reference, and as a identifier of a location, which is 
a physical relation of access. Just naming something has no effect on the thing 
named: naming something does not bathe the thing named in any type of energy 
that we can detect via a physical radar. There is no way to build a detector to 
detect what exactly someone means by a URI, although we can guess from talking 
to them or accessing representations they give us. Locations give you physical, 
connected, access to a thing. If you go to a location to get something, if the 
thing is there you return with it physically in hand. A name might, but does 
not have to and usually does not give one any sort of physical, connected, 
access to the thing named by the location.

The word "identifier" is even more vague than name or location, and here the 
problem of the "identity" crisis appears: how do we know if the URI is being 
used for something as a name or as a location? The URI itself does not tell us. 
Even worse, what does "identify" mean, and how can we tell if two things 
identify the same thing? With representationsAI that is sometimes very clear, 
as in photographs, and sometimes not so clear, as in abstract art. Even the 
integers have problems with identification: does "11" identify eleven in 
decimal or three in binary? We won't know - and can't know unless we are given 
some sort of decoding scheme. In programming language tradition "identifier" 
has a pretty secure meaning and in that context the access/reference 
distinction is theoretically important but not of great practical significance, 
since everything you can refer to is physically accessible by the computer and 
has an address in memory. This is not true of logic, and definitely not true of 
model-theoretic semantics. Importantly, the access and reference distinction 
holds on the Web with many things that have URIs. In an information space, 
things may be identified without being accessed via a physical connection. In 
terms of the AWWW, a "non-information" resource is probably similar to the use 
of URI-as-access, while the use of URI for reference without access is called 
an "information resource."

Solving the Identity Crisis

Then there's the identity crisis: a single URI can actually play both roles 
(name with no access and location with access) at the same time, which gives us 
a powerful device for some application. The official view is that the 
representations are supposed to be interpreted by applications depending on 
MIME types is clearly focused on the use of a URI as a location for access; yet 
nothing forbids a URI that returns a representationREST or some other data to 
be used tell the web client that this URI is also a name for reference in 
addition to a location for access. In fact, for a URI used only as a name, 
MIME-types are clearly irrelevant. At least for the time being!

It would be useful to distinguish when a URI is used as "name" or as a 
"location, " and if some URIs can only be used as names or only as locations. 
In other words, this depends on whether the thing (which would be the 
"resource") identified by URI is on the Web or not. This already reduces to the 
"non-information resource" and "information resource" distinction on some 
level, and so is not a return to the historical Dark Ages of the Web. Since 
they share a common syntax, it does make sense to unite URLs and URNs on a 
level as URIs, and even to use URLs as "names." The identity crisis can be 
solved pretty easily, as shown by the Web Proper Names proposal. First, a 
separate URI scheme (wpn:// or tdb://) can distinguish the use of URI as names 
for reference from URI as locations for access. To capitalise even further on 
the identity crisis, this can be distinguished without a new URI scheme by 
solving it by the use of a representationREST, by having a type of 
representation format which says that this URI is a "name" as opposed to a 
"location." In fact, one could even have a special MIME-type to distinguish 
names for things: imagine the "name" MIME-type, or the 
"application/xhtml+xml+name" type.

The Future...

However, one subject which needs more exploration is the "interpretation" of 
URIs as names. How does one tell, if a URI as a name for reference, what its 
interpretation is? All the RDF statements that apply to that URI? And if so, 
how do we get them in a decentralized system? SPARQL? URIQA? Magic? In other 
words, assuming the URI gave you machine-readable descriptions in some Semantic 
Web language readable by machines, should the use of a URI-as-a-name really 
mean that this URI refers to (or denotes) whatever is necessary to satisfy the 
Semantic Web description? The Semantic Web allows one to build a number of 
roles and assertions, and one would assume that its interpretation is those 
other Semantic Web URIs that are satisfied by these roles and assertions. 
However, the SemWeb as it stands just has URIs as Semantic Web objects 
referring as names to other URIs as Semantic Web objects, and does not fulfill 
what the Semantic Web really needs: a way to move out of the Web and to the 
wide world beyond the Web. The Web needs to be integrated more into the world, 
and there lies the true holy grail of the Semantic Web. This is not just a 
problem for the Web, but the fundamental problem that proved to be the ultimate 
bane of AI. Indeed, it's easy to just attach a model theory to any formal 
system and say "We have semantics." Yes, that's strictly true - but let's not 
forget the adjective "model-theoretic." And models of the real world can be 
wrong, and often are. The real burden of the Semantic Web will lie on the 
ability of people and machines to produce models using SemWeb languages whose 
model-theoretic interpretations are relevant to the real world, and match them 
in interesting and useful ways that allow the Web to do things that are either 
impossible or very difficult on the current Web. Can people and machines do 
this in a large, dencentralized manner? Are the SemWeb standards sufficient for 
the task? Yet, while the answer to that question is unknown, the winds seem 
favorable.

Received on Tuesday, 5 April 2005 02:59:42 UTC