[www-tag] <none> from Harry Halpin on 2005-04-05 (www-tag@w3.org from April 2005)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Mon, 4 Apr 2005 22:56:22 -0400 (EDT)
To: www-tag@w3.org
Cc: www-rdf-interest@w3.org, semantic-web@w3.org
Message-ID: <Pine.LNX.4.61.0504042247120.2066@tribal.metalab.unc.edu>
Again, there seems to be the usual questions about the SemWeb popping up,
and in particular http-range-14. There also doesn't seem to be much 
progress on these issues. Here's some notes that I think may be helpful,
which basically try to distinguish between URIs as names for locations 
versus URIs as locations for physical access, as well as try to define the 
elusive term "on the Web" as being something that if the Web was 
destroyed, would also be destroyed. Also I distinguish between the use of 
representation in REST versus representation in AI/philosophy, which are 
not always the same. I think these distinctions, and taking them 
seriously, is clearly very important to http-range-14.

The full text is here, and benefited from some discussion with Pat Hayes:

http://www.ibiblio.org/hhalpin/homepage/notes/uri.html

Text version below:
-----------------------------------------------------------------------
URIs as Names for Reference and as Locations for Access
httpRange-14 notes
By Harry Halpin
Thanks to Pat Hayes for some examples and commentary, although any errors 
are due to me of course!


What do URIs identify?

In essence, one reason Web works because using a web protocol like 
http(Hypertext Transfer Protocol), one can from a client send a request to 
a server to do an operation such as HTTP GET for a given URI and 
dereference something, often a web-page. However, this very basic feature 
of the Web is bedeviled by a question: "What is the range of the HTTP 
dereference function?" In other words, what do URIs identify? In theory 
this question has been solved by the W3C TAG's AWWW: URIs refer to 
anything. Upon inspection, the official definition is actually circular: 
"We do not limit the scope of what might be a resource...it is used in a 
general sense for whatever might be identified by a URI." The question 
then arises that if a resource is just anything that could theoretically 
be with a identified URI, is there anything that can not be identified? It 
would seem not. This view is given by the AWWW as "our use of the term 
resource is intentionally more broad. Other things, such as cars and dogs 
... are resources too." However, referring to a web-page and the car in my 
garage are similar, but not exactly the same. The essential difference is 
this: in the first case on the Web we have physical, connected, access to 
the Web-page, while in the second case if we are using Semantic Web logic 
to refer to my car, we only the ability to refer to my car by a URI name, 
and this has no direct, connected, or physical access. When one uses a URI 
as a name there is a disconnect, as the thing named may not be on the Web.

The division between representation and resource existed but was not 
explicitly stated, and definitely not noticed by, most of the users of the 
original hypertext Web. URLs seem to be originally meant to identify the 
location of representations, such as HTML web-pages, or possibly sets of 
representations, such when through content negotiation a news website 
figures out where you live and then serves you your local news. With the 
advent of the Semantic Web, the problem of httpRange-14 comes up precisely 
because a URI can be used to refer to anything, not just web pages. To be 
more precise, the issue comes up because URIs can refer to things that are 
not "on the Web" and so do not necessarily have a Web-accessible 
representation. Despite of this, these things that are "not on the Web" 
are fundamentally "on the Web" in another sense, since they can be 
reasoned about by the Semantic Web. The crucial point is what does "on the 
Web" mean? To answer that question we must pursue the historical chain of 
events from URL to URN to URI.

Locations

Uniform Resource Locations (URL) did not suffer from the httpRange-14 
issue, unlike their nearly identical brethren URIs. Unlike URIs, URLs 
identified a specific type of thing: a location, which is a physical 
place. This location was assumed to be on the Web. By "on the Web," 
something that is physically connected to the Web. A URL denotes a 
location on some web-server which serves representations (HTML document, 
music file to download, whatever) to visiting web clients. A location can 
be connected to the Web because it - even after endless redirection - in a 
physical place.

Take a mundane example: my address. An address is a just a location that 
has a thing that can (usually) be found at that location, and there exists 
a specified system for finding the location of an address. This allows 
multiple locations to be ordered in a way that humans, such as in street 
addresses (or machines in the case of IP addresses) can navigate easily. 
In the case of my address, and if one wants to find me, they can try to 
looks for at the location of my address - and I'm sometimes not there, so 
my address can give the person trying to find me a metaphysical 404 error. 
A location can, and should, give you direct, connected, physical access to 
the thing at the location. URLs are used as names of locations, and 
sending at HTTP GET (or POST, or HEAD, and so on) to a server requires the 
server if possible to go to the location and physically access the thing 
at the location, usually by copying it and sending a copy to your 
computer. Or sending a very real 404 error.

On the Web

Something could be found on the Web if it physically and causally 
connected to the Web. This means that whatever it was "on the Web," it 
could be encoded into bits and transferred over the Web. However, this is 
only "on the Web" the Web in the strongest sense: as in always on the Web. 
A thing can be only on the Web sometimes, or only partially on the Web, or 
only rarely on the Web. By our definition, if it could not be removed from 
the Web without loss of its functionality. One can imagine a whole range 
of possibilities, from being "strongly" on the Web (all the time) to 
"weakly" on the Web (occasionally). Thus, both documents and servers are 
"on the Web", and humans are not "on the Web" in a weak sense since they 
only interacted directly with the Web indirectly through typing on 
keyboards. Things like the Eiffel Tower or Louis XVI are definitely "not 
on the Web" on the Web, since Louis XVI is long gone and cannot at any 
point directly connect physically to the Web, while the Eiffel Tower is 
only represented on the Web, but no physically sending any bytes to anyone 
itself. The Eiffel tower is composed not of bytes, but of steel. This 
brings us to "representations" on the Web. What is the difference between 
something merely having a representation on the Web and something being 
fully on the Web? Rephrasing Brian Smith: Some thing is on the Web such 
that if the Web itself was destroyed, that thing would also be destroyed. 
If not, it's not fully on the Web. If someone destroyed the Web, this 
would not damage me if I were being denoted by a URI, but my homepage at 
that URI would be up in smoke if that what's people were using to refer to 
me by. I am not on the Web in a strong sense, but my homepage sure is. 
There are lots of middling cases: my computer is weakly on the Web, more 
so than myself. If my httpd daemon went down and my computer could no 
longer access the Web, or the Web itself collapsed, the computer qua 
computer still exists, but the computer qua Web server went up in smoke 
with the rest of the Web. One good question yet to be answered when are 
humans on the Web in a strong sense? Would it require our credit card 
details to be in an chip beneath our skin with a URI, and wireless 
internet monitoring us with a GPS that sent messages over the Internet? 
Those examples seem also too simplistic and extreme. Still, what is the 
difference between a something being represented on the Web and being on 
the Web? One necessary but not nearly sufficient condition for 
"representation" would be that a thing X represents another thing Y if you 
can destroy thing X and thing Y remains unscathed. Representations qua 
representations are on the Web, and would be destroyed if the Web was 
destroyed. However, what they represent would not be destroyed, unless 
what the representation represented also was on the Web.

Representations: REST and AI

Before going any further, we have to distinguish two different uses of the 
word "representation." The first is the use of "representation" as it is 
used artificial intelligence, cognitive science, and philosophy. In this 
use, a representation is something that "denotes" or "is about" something 
else, although often additional requirements are put on exactly what type 
of things the representation or its denotation may be. This will be called 
"representationAI." The second use is the use of "representation" as used 
by REST (The Representational State Transfer web architecture theory of 
Roy Fielding), where a representation can be whatever that a URI returns 
from a HTTP request. This will be called a "representationREST". A 
representationREST, unlike a representationAI, does not necessarily refer 
to or denote any other thing - although it might! The two definitions are 
not the same, but not mutually exclusive either. So, the difference 
between "on the Web" and "not on the Web" is also a test of both types of 
representation. A representationAI can qua representationAI be entirely on 
the Web if what it represents is also on the Web. Lots of representations, 
such an analog photo on my desk, are not on the Web at all. In another 
case, a picture of me on the Web is on the Web qua itself but not on the 
Web qua me, because it denotes me, not something on the Web. If the Web 
was destroyed, it would only destroy the bytes of the representationAI, 
not necessarily what the representation denoted. Also, representationsAI 
may have layers of representationAI, as one representation may denote 
other representationsAI, leading to all sorts of interesting chains of 
reference. However, representationsREST are by definition on the Web, and 
would be destroyed if the Web was destroyed, at least as the possible 
objects of HTTP operations. This is because representationsREST are 
defined precisely as the bytes that are sent over the Web. One could argue 
that copies of them archived to a computer might survive. However, those 
copies would no longer be representationsREST qua the Web, but just 
whatever they are without the Web being involved. This argument does 
reveal that both sorts of representation are functional categories that 
are dependent on their context, as something is never a representationREST 
without being on the Web (or in some parallel universe, another system 
that implements REST). Something is never a representationAI without 
something being represented.

Virtual Locations and Digitality

This idea of physically being on the Web can be abstracted from the 
concept of location. "Being on the Web" does not mean a thing has one URL 
or even physical location. Something could be on the Web and have multiple 
URLs, are multiple copies in different physical locations. A location can 
be a virtual location, an abstraction over a set of possible physical 
representations, as long as it really is a location. What exactly is the 
"thing" at a URL location? It's not just a particular server, nor is it 
some abstract resource. It is actually some bytes, a representationREST or 
set of representationsREST, which one has to actually GET to determine 
using your web client to see if it's a representationAI. The particular 
server where the actual representationREST lives is actually denoted by 
another type of location: wherever it is on the server, and the server has 
a very concrete IP address. A URL can be a name that denotes a virtual 
location, which is the forwarded to the place where the concrete bits are 
stored. These bits are usually on a server somewhere. When one accesses 
http://www.w3c.org, if I am in Japan I get the mirror of the W3C web-pages 
in Japan, if I'm in the US I get the one hosted at MIT, but I get the same 
"resource," regardless. Here the concept of resource as stated by TAG 
starts making some sense. It's a concept about the contents of a 
representationREST. However, this resource is not identical to the thing 
physically received as bytes (that's the representationREST). A resource 
seems to be the abstract idea of the common information between all the 
possible representationsREST returned. To properly understand resource 
then one needs a thorough inspection of theories of information and 
content, which is beyond the scope of this little note. Still, what is 
physically returned by a HTTP GET is just the representationREST, which 
may differ between MIT and Kyoto, while it might not between INRIA and 
MIT. The fact that the Web is digital becomes crucially important: the 
"copyability" of the representationsREST, due to their digital nature, is 
crucial to why the Web works, just as crucial as a universal naming 
scheme. Yet, things not "on the Web" (Pat Hayes qua Pat Hayes, my dog, 
etc) don't have this property of copyability. A picture on the Web of Pat 
Hayes is digital, but Pat Hayes is not, no matter how much time he spends 
online.

What's in a Name?

A name is entirely different from a location. Unlike a location, a name 
does not necessarily give you access to the thing named, and this thing 
name we will call the referent of the name. The set of all referents of a 
name (or denotations of a representation for that matter) we will call its 
interpretation. In fact, names are usually used when connected, physical 
access is impossible, and as such are place-holders for the physical thing 
precisely because there is no physical access. This concept of "names" is 
more in line with the URN effort, which essentially tries to serve as 
rigid designators in the Kripkean sense for the Web. Since a name does not 
have any connection to a referent, putting a name on the Web via a URI 
(such as a URN) does absolutely nothing at all to the referent of the 
name. When anyone accesses the resource "Pat Hayes" from URI 
,http://www.ihmc.us/users/phayes/PatHayes.html, Pat Hayes does magically 
appear next to them. What that URI currently can return from a HTTP get is 
a representationREST: a Web-page in HTML encoded as very physical bytes 
somewhere that get sent to me over a wire as very physical bytes, and then 
displaying by a very physical computer the social security number of Pat 
Hayes and other defining details. It could even theoretically return a 
definition of Pat Hayes in RDF. Yet this particular URI representationREST 
also serves double-duty as a representationAI, since it contains pictures 
of the actual Pat Hayes, relevant facts about him, and so on. Pat Hayes 
himself is not on the Web, since if the Web is destroyed Pat Hayes would 
merrily go along, and probably with more spare time.

So, the use of a URI as a "name" causes a URI to be used as a 
representationAI. However, what exactly the interpretation of a URI as a 
"name" actually is goes beyond the physics of transferring bytes. This 
interpretation is either the yet-to-come metaphysics of the Semantic Web, 
social meaning, or something else - who knows? But what is important is 
that it is a non-physical, non-causal, non-connected relationship, unlike 
the relationship of a location which is a physical, connected, causal 
relationship. Note that URIs used as names-for-reference are common in the 
Semantic Web, and the Semantic Web depends on there being names with 
interpretations to reason over. Because there is no direct access to the 
thing the URI-as-name identifies, unlike the use of a URI-as-location, the 
Semantic Web uses URIs without any necessary use of representationsREST. A 
URI in the Semantic Web is used more like as "place-holders" or even 
(stretching it a bit) "keys," without any HTTP operation returning any 
bytes from a server in terms of representationREST. Thus, the Semantic Web 
uses URIs as representationsAI, while the Good-Old HyperText Web uses URIs 
as representationsREST.

Double Lives as Names and Locations

The key of the confusion is that http fundamentally will dereference 
whatever a URI refers to, and there are two distinct types of functional 
roles a URI can play: name and location. A URI can serves as a 
identifier-as-a-name, which is a non-physical relation of reference, and 
as a identifier of a location, which is a physical relation of access. 
Just naming something has no effect on the thing named: naming something 
does not bathe the thing named in any type of energy that we can detect 
via a physical radar. There is no way to build a detector to detect what 
exactly someone means by a URI, although we can guess from talking to them 
or accessing representations they give us. Locations give you physical, 
connected, access to a thing. If you go to a location to get something, if 
the thing is there you return with it physically in hand. A name might, 
but does not have to and usually does not give one any sort of physical, 
connected, access to the thing named by the location.

The word "identifier" is even more vague than name or location, and here 
the problem of the "identity" crisis appears: how do we know if the URI is 
being used for something as a name or as a location? The URI itself does 
not tell us. Even worse, what does "identify" mean, and how can we tell if 
two things identify the same thing? With representationsAI that is 
sometimes very clear, as in photographs, and sometimes not so clear, as in 
abstract art. Even the integers have problems with identification: does 
"11" identify eleven in decimal or three in binary? We won't know - and 
can't know unless we are given some sort of decoding scheme. In 
programming language tradition "identifier" has a pretty secure meaning 
and in that context the access/reference distinction is theoretically 
important but not of great practical significance, since everything you 
can refer to is physically accessible by the computer and has an address 
in memory. This is not true of logic, and definitely not true of 
model-theoretic semantics. Importantly, the access and reference 
distinction holds on the Web with many things that have URIs. In an 
information space, things may be identified without being accessed via a 
physical connection. In terms of the AWWW, a "non-information" resource is 
probably similar to the use of URI-as-access, while the use of URI for 
reference without access is called an "information resource."

Solving the Identity Crisis

Then there's the identity crisis: a single URI can actually play both 
roles (name with no access and location with access) at the same time, 
which gives us a powerful device for some application. The official view 
is that the representations are supposed to be interpreted by applications 
depending on MIME types is clearly focused on the use of a URI as a 
location for access; yet nothing forbids a URI that returns a 
representationREST or some other data to be used tell the web client that 
this URI is also a name for reference in addition to a location for 
access. In fact, for a URI used only as a name, MIME-types are clearly 
irrelevant. At least for the time being!

It would be useful to distinguish when a URI is used as "name" or as a 
"location, " and if some URIs can only be used as names or only as 
locations. In other words, this depends on whether the thing (which would 
be the "resource") identified by URI is on the Web or not. This already 
reduces to the "non-information resource" and "information resource" 
distinction on some level, and so is not a return to the historical Dark 
Ages of the Web. Since they share a common syntax, it does make sense to 
unite URLs and URNs on a level as URIs, and even to use URLs as "names." 
The identity crisis can be solved pretty easily, as shown by the Web 
Proper Names proposal. First, a separate URI scheme (wpn:// or tdb://) can 
distinguish the use of URI as names for reference from URI as locations 
for access. To capitalise even further on the identity crisis, this can be 
distinguished without a new URI scheme by solving it by the use of a 
representationREST, by having a type of representation format which says 
that this URI is a "name" as opposed to a "location." In fact, one could 
even have a special MIME-type to distinguish names for things: imagine the 
"name" MIME-type, or the "application/xhtml+xml+name" type.

The Future...

However, one subject which needs more exploration is the "interpretation" 
of URIs as names. How does one tell, if a URI as a name for reference, 
what its interpretation is? All the RDF statements that apply to that URI? 
And if so, how do we get them in a decentralized system? SPARQL? URIQA? 
Magic? In other words, assuming the URI gave you machine-readable 
descriptions in some Semantic Web language readable by machines, should 
the use of a URI-as-a-name really mean that this URI refers to (or 
denotes) whatever is necessary to satisfy the Semantic Web description? 
The Semantic Web allows one to build a number of roles and assertions, and 
one would assume that its interpretation is those other Semantic Web URIs 
that are satisfied by these roles and assertions. However, the SemWeb as 
it stands just has URIs as Semantic Web objects referring as names to 
other URIs as Semantic Web objects, and does not fulfill what the Semantic 
Web really needs: a way to move out of the Web and to the wide world 
beyond the Web. The Web needs to be integrated more into the world, and 
there lies the true holy grail of the Semantic Web. This is not just a 
problem for the Web, but the fundamental problem that proved to be the 
ultimate bane of AI. Indeed, it's easy to just attach a model theory to 
any formal system and say "We have semantics." Yes, that's strictly true - 
but let's not forget the adjective "model-theoretic." And models of the 
real world can be wrong, and often are. The real burden of the Semantic 
Web will lie on the ability of people and machines to produce models using 
SemWeb languages whose model-theoretic interpretations are relevant to the 
real world, and match them in interesting and useful ways that allow the 
Web to do things that are either impossible or very difficult on the 
current Web. Can people and machines do this in a large, dencentralized 
manner? Are the SemWeb standards sufficient for the task? Yet, while the 
answer to that question is unknown, the winds seem favorable.
Received on Tuesday, 5 April 2005 02:56:23 UTC