Re: URIs as names-for-reference vs locations-for-access from Harry Halpin on 2005-04-05 (semantic-web@w3.org from April 2005)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Tue, 5 Apr 2005 01:01:31 -0400 (EDT)
To: Jeremy Wong 黃泓量 <50263336@student.cityu.edu.hk>
Cc: www-tag@w3.org, www-rdf-interest@w3.org, semantic-web@w3.org
Message-ID: <Pine.LNX.4.61.0504050055070.2066@tribal.metalab.unc.edu>
I think you miss the point.

Your free to dereference a URI all you want, but obviously
http://www.ihmc.us/users/phayes/PatHayes.html
does not dereference Pat Hayes in the same way
http://www.ihmc.us/users/user.php?UserID=4
clearly dereferences his web-page.

Even if
http://www.ihmc.us/users/phayes/PatHayes.html
gave you via content-negotiation/URIQA/whatever some nice RDF,
you get RDF, not Pat Hayes. And you can't owl:imports Pat Hayes, 
although you can probably get more RDF statements about him if you really 
want to (well, assuming someone has coded them somewhere).

Confusing the RDF/web-page for Pat Hayes is basically confusing
the map for the territory. Not that doing that can't be useful quite
often.

It seems it's not a problem, it's a feature of the SemWeb :)

 				-harry

On Tue, 5 Apr 2005, [utf-8] Jeremy Wong ??? wrote:

> In the world of RDF, it is free to dereference a URI. It is really a problem 
> because a representationREST may be temporarily unaccessible due to network 
> failure and server maintenance. In the world of OWL, we have the vocabulary 
> owl:imports. We have the concept of imports closure. It becomes not a 
> problem. Hence, semantic inference in the world of RDF should be done 
> manually by merging RDF graphs. Semantic inference in the world of OWL can be 
> done dynamically when the collection of ontologies and axioms and facts is 
> imports closed.
>
> It seems that it is not a problem in the semantic web.
>
>
> Jeremy
>
> ----- Original Message ----- From: "Harry Halpin" <hhalpin@ibiblio.org>
> To: <www-tag@w3.org>
> Cc: <www-rdf-interest@w3.org>; <semantic-web@w3.org>
> Sent: Tuesday, April 05, 2005 10:59 AM
> Subject: URIs as names-for-reference vs locations-for-access
>
>
>> 
>> Ah, a title might be courteous....
>> 
>> Again, there seems to be the usual questions about the SemWeb popping up,
>> and in particular http-range-14. There also doesn't seem to be much 
>> progress on these issues. Here's some notes that I think may be helpful,
>> which basically try to distinguish between URIs as names for locations 
>> versus URIs as locations for physical access, as well as try to define the 
>> elusive term "on the Web" as being something that if the Web was destroyed, 
>> would also be destroyed. Also I distinguish between the use of 
>> representation in REST versus representation in AI/philosophy, which are 
>> not always the same. I think these distinctions, and taking them seriously, 
>> is clearly very important to http-range-14.
>> 
>> The full text is here, and benefited from some discussion with Pat Hayes:
>> 
>> http://www.ibiblio.org/hhalpin/homepage/notes/uri.html
>> 
>> Text version below:
>> -----------------------------------------------------------------------
>> URIs as Names for Reference and as Locations for Access
>> httpRange-14 notes
>> By Harry Halpin
>> Thanks to Pat Hayes for some examples and commentary, although any errors 
>> are due to me of course!
>> 
>> 
>> What do URIs identify?
>> 
>> In essence, one reason Web works because using a web protocol like 
>> http(Hypertext Transfer Protocol), one can from a client send a request to 
>> a server to do an operation such as HTTP GET for a given URI and 
>> dereference something, often a web-page. However, this very basic feature 
>> of the Web is bedeviled by a question: "What is the range of the HTTP 
>> dereference function?" In other words, what do URIs identify? In theory 
>> this question has been solved by the W3C TAG's AWWW: URIs refer to 
>> anything. Upon inspection, the official definition is actually circular: 
>> "We do not limit the scope of what might be a resource...it is used in a 
>> general sense for whatever might be identified by a URI." The question then 
>> arises that if a resource is just anything that could theoretically be with 
>> a identified URI, is there anything that can not be identified? It would 
>> seem not. This view is given by the AWWW as "our use of the term resource 
>> is intentionally more broad. Other things, such as cars and dogs ... are 
>> resources too." However, referring to a web-page and the car in my garage 
>> are similar, but not exactly the same. The essential difference is this: in 
>> the first case on the Web we have physical, connected, access to the 
>> Web-page, while in the second case if we are using Semantic Web logic to 
>> refer to my car, we only the ability to refer to my car by a URI name, and 
>> this has no direct, connected, or physical access. When one uses a URI as a 
>> name there is a disconnect, as the thing named may not be on the Web.
>> 
>> The division between representation and resource existed but was not 
>> explicitly stated, and definitely not noticed by, most of the users of the 
>> original hypertext Web. URLs seem to be originally meant to identify the 
>> location of representations, such as HTML web-pages, or possibly sets of 
>> representations, such when through content negotiation a news website 
>> figures out where you live and then serves you your local news. With the 
>> advent of the Semantic Web, the problem of httpRange-14 comes up precisely 
>> because a URI can be used to refer to anything, not just web pages. To be 
>> more precise, the issue comes up because URIs can refer to things that are 
>> not "on the Web" and so do not necessarily have a Web-accessible 
>> representation. Despite of this, these things that are "not on the Web" are 
>> fundamentally "on the Web" in another sense, since they can be reasoned 
>> about by the Semantic Web. The crucial point is what does "on the Web" 
>> mean? To answer that question we must pursue the historical chain of events 
>> from URL to URN to URI.
>> 
>> Locations
>> 
>> Uniform Resource Locations (URL) did not suffer from the httpRange-14 
>> issue, unlike their nearly identical brethren URIs. Unlike URIs, URLs 
>> identified a specific type of thing: a location, which is a physical place. 
>> This location was assumed to be on the Web. By "on the Web," something that 
>> is physically connected to the Web. A URL denotes a location on some 
>> web-server which serves representations (HTML document, music file to 
>> download, whatever) to visiting web clients. A location can be connected to 
>> the Web because it - even after endless redirection - in a physical place.
>> 
>> Take a mundane example: my address. An address is a just a location that 
>> has a thing that can (usually) be found at that location, and there exists 
>> a specified system for finding the location of an address. This allows 
>> multiple locations to be ordered in a way that humans, such as in street 
>> addresses (or machines in the case of IP addresses) can navigate easily. In 
>> the case of my address, and if one wants to find me, they can try to looks 
>> for at the location of my address - and I'm sometimes not there, so my 
>> address can give the person trying to find me a metaphysical 404 error. A 
>> location can, and should, give you direct, connected, physical access to 
>> the thing at the location. URLs are used as names of locations, and sending 
>> at HTTP GET (or POST, or HEAD, and so on) to a server requires the server 
>> if possible to go to the location and physically access the thing at the 
>> location, usually by copying it and sending a copy to your computer. Or 
>> sending a very real 404 error.
>> 
>> On the Web
>> 
>> Something could be found on the Web if it physically and causally connected 
>> to the Web. This means that whatever it was "on the Web," it could be 
>> encoded into bits and transferred over the Web. However, this is only "on 
>> the Web" the Web in the strongest sense: as in always on the Web. A thing 
>> can be only on the Web sometimes, or only partially on the Web, or only 
>> rarely on the Web. By our definition, if it could not be removed from the 
>> Web without loss of its functionality. One can imagine a whole range of 
>> possibilities, from being "strongly" on the Web (all the time) to "weakly" 
>> on the Web (occasionally). Thus, both documents and servers are "on the 
>> Web", and humans are not "on the Web" in a weak sense since they only 
>> interacted directly with the Web indirectly through typing on keyboards. 
>> Things like the Eiffel Tower or Louis XVI are definitely "not on the Web" 
>> on the Web, since Louis XVI is long gone and cannot at any point directly 
>> connect physically to the Web, while the Eiffel Tower is only represented 
>> on the Web, but no physically sending any bytes to anyone itself. The 
>> Eiffel tower is composed not of bytes, but of steel. This brings us to 
>> "representations" on the Web. What is the difference between something 
>> merely having a representation on the Web and something being fully on the 
>> Web? Rephrasing Brian Smith: Some thing is on the Web such that if the Web 
>> itself was destroyed, that thing would also be destroyed. If not, it's not 
>> fully on the Web. If someone destroyed the Web, this would not damage me if 
>> I were being denoted by a URI, but my homepage at that URI would be up in 
>> smoke if that what's people were using to refer to me by. I am not on the 
>> Web in a strong sense, but my homepage sure is. There are lots of middling 
>> cases: my computer is weakly on the Web, more so than myself. If my httpd 
>> daemon went down and my computer could no longer access the Web, or the Web 
>> itself collapsed, the computer qua computer still exists, but the computer 
>> qua Web server went up in smoke with the rest of the Web. One good question 
>> yet to be answered when are humans on the Web in a strong sense? Would it 
>> require our credit card details to be in an chip beneath our skin with a 
>> URI, and wireless internet monitoring us with a GPS that sent messages over 
>> the Internet? Those examples seem also too simplistic and extreme. Still, 
>> what is the difference between a something being represented on the Web and 
>> being on the Web? One necessary but not nearly sufficient condition for 
>> "representation" would be that a thing X represents another thing Y if you 
>> can destroy thing X and thing Y remains unscathed. Representations qua 
>> representations are on the Web, and would be destroyed if the Web was 
>> destroyed. However, what they represent would not be destroyed, unless what 
>> the representation represented also was on the Web.
>> 
>> Representations: REST and AI
>> 
>> Before going any further, we have to distinguish two different uses of the 
>> word "representation." The first is the use of "representation" as it is 
>> used artificial intelligence, cognitive science, and philosophy. In this 
>> use, a representation is something that "denotes" or "is about" something 
>> else, although often additional requirements are put on exactly what type 
>> of things the representation or its denotation may be. This will be called 
>> "representationAI." The second use is the use of "representation" as used 
>> by REST (The Representational State Transfer web architecture theory of Roy 
>> Fielding), where a representation can be whatever that a URI returns from a 
>> HTTP request. This will be called a "representationREST". A 
>> representationREST, unlike a representationAI, does not necessarily refer 
>> to or denote any other thing - although it might! The two definitions are 
>> not the same, but not mutually exclusive either. So, the difference between 
>> "on the Web" and "not on the Web" is also a test of both types of 
>> representation. A representationAI can qua representationAI be entirely on 
>> the Web if what it represents is also on the Web. Lots of representations, 
>> such an analog photo on my desk, are not on the Web at all. In another 
>> case, a picture of me on the Web is on the Web qua itself but not on the 
>> Web qua me, because it denotes me, not something on the Web. If the Web was 
>> destroyed, it would only destroy the bytes of the representationAI, not 
>> necessarily what the representation denoted. Also, representationsAI may 
>> have layers of representationAI, as one representation may denote other 
>> representationsAI, leading to all sorts of interesting chains of reference. 
>> However, representationsREST are by definition on the Web, and would be 
>> destroyed if the Web was destroyed, at least as the possible objects of 
>> HTTP operations. This is because representationsREST are defined precisely 
>> as the bytes that are sent over the Web. One could argue that copies of 
>> them archived to a computer might survive. However, those copies would no 
>> longer be representationsREST qua the Web, but just whatever they are 
>> without the Web being involved. This argument does reveal that both sorts 
>> of representation are functional categories that are dependent on their 
>> context, as something is never a representationREST without being on the 
>> Web (or in some parallel universe, another system that implements REST). 
>> Something is never a representationAI without something being represented.
>> 
>> Virtual Locations and Digitality
>> 
>> This idea of physically being on the Web can be abstracted from the concept 
>> of location. "Being on the Web" does not mean a thing has one URL or even 
>> physical location. Something could be on the Web and have multiple URLs, 
>> are multiple copies in different physical locations. A location can be a 
>> virtual location, an abstraction over a set of possible physical 
>> representations, as long as it really is a location. What exactly is the 
>> "thing" at a URL location? It's not just a particular server, nor is it 
>> some abstract resource. It is actually some bytes, a representationREST or 
>> set of representationsREST, which one has to actually GET to determine 
>> using your web client to see if it's a representationAI. The particular 
>> server where the actual representationREST lives is actually denoted by 
>> another type of location: wherever it is on the server, and the server has 
>> a very concrete IP address. A URL can be a name that denotes a virtual 
>> location, which is the forwarded to the place where the concrete bits are 
>> stored. These bits are usually on a server somewhere. When one accesses 
>> http://www.w3c.org, if I am in Japan I get the mirror of the W3C web-pages 
>> in Japan, if I'm in the US I get the one hosted at MIT, but I get the same 
>> "resource," regardless. Here the concept of resource as stated by TAG 
>> starts making some sense. It's a concept about the contents of a 
>> representationREST. However, this resource is not identical to the thing 
>> physically received as bytes (that's the representationREST). A resource 
>> seems to be the abstract idea of the common information between all the 
>> possible representationsREST returned. To properly understand resource then 
>> one needs a thorough inspection of theories of information and content, 
>> which is beyond the scope of this little note. Still, what is physically 
>> returned by a HTTP GET is just the representationREST, which may differ 
>> between MIT and Kyoto, while it might not between INRIA and MIT. The fact 
>> that the Web is digital becomes crucially important: the "copyability" of 
>> the representationsREST, due to their digital nature, is crucial to why the 
>> Web works, just as crucial as a universal naming scheme. Yet, things not 
>> "on the Web" (Pat Hayes qua Pat Hayes, my dog, etc) don't have this 
>> property of copyability. A picture on the Web of Pat Hayes is digital, but 
>> Pat Hayes is not, no matter how much time he spends online.
>> 
>> What's in a Name?
>> 
>> A name is entirely different from a location. Unlike a location, a name 
>> does not necessarily give you access to the thing named, and this thing 
>> name we will call the referent of the name. The set of all referents of a 
>> name (or denotations of a representation for that matter) we will call its 
>> interpretation. In fact, names are usually used when connected, physical 
>> access is impossible, and as such are place-holders for the physical thing 
>> precisely because there is no physical access. This concept of "names" is 
>> more in line with the URN effort, which essentially tries to serve as rigid 
>> designators in the Kripkean sense for the Web. Since a name does not have 
>> any connection to a referent, putting a name on the Web via a URI (such as 
>> a URN) does absolutely nothing at all to the referent of the name. When 
>> anyone accesses the resource "Pat Hayes" from URI 
>> ,http://www.ihmc.us/users/phayes/PatHayes.html, Pat Hayes does magically 
>> appear next to them. What that URI currently can return from a HTTP get is 
>> a representationREST: a Web-page in HTML encoded as very physical bytes 
>> somewhere that get sent to me over a wire as very physical bytes, and then 
>> displaying by a very physical computer the social security number of Pat 
>> Hayes and other defining details. It could even theoretically return a 
>> definition of Pat Hayes in RDF. Yet this particular URI representationREST 
>> also serves double-duty as a representationAI, since it contains pictures 
>> of the actual Pat Hayes, relevant facts about him, and so on. Pat Hayes 
>> himself is not on the Web, since if the Web is destroyed Pat Hayes would 
>> merrily go along, and probably with more spare time.
>> 
>> So, the use of a URI as a "name" causes a URI to be used as a 
>> representationAI. However, what exactly the interpretation of a URI as a 
>> "name" actually is goes beyond the physics of transferring bytes. This 
>> interpretation is either the yet-to-come metaphysics of the Semantic Web, 
>> social meaning, or something else - who knows? But what is important is 
>> that it is a non-physical, non-causal, non-connected relationship, unlike 
>> the relationship of a location which is a physical, connected, causal 
>> relationship. Note that URIs used as names-for-reference are common in the 
>> Semantic Web, and the Semantic Web depends on there being names with 
>> interpretations to reason over. Because there is no direct access to the 
>> thing the URI-as-name identifies, unlike the use of a URI-as-location, the 
>> Semantic Web uses URIs without any necessary use of representationsREST. A 
>> URI in the Semantic Web is used more like as "place-holders" or even 
>> (stretching it a bit) "keys," without any HTTP operation returning any 
>> bytes from a server in terms of representationREST. Thus, the Semantic Web 
>> uses URIs as representationsAI, while the Good-Old HyperText Web uses URIs 
>> as representationsREST.
>> 
>> Double Lives as Names and Locations
>> 
>> The key of the confusion is that http fundamentally will dereference 
>> whatever a URI refers to, and there are two distinct types of functional 
>> roles a URI can play: name and location. A URI can serves as a 
>> identifier-as-a-name, which is a non-physical relation of reference, and as 
>> a identifier of a location, which is a physical relation of access. Just 
>> naming something has no effect on the thing named: naming something does 
>> not bathe the thing named in any type of energy that we can detect via a 
>> physical radar. There is no way to build a detector to detect what exactly 
>> someone means by a URI, although we can guess from talking to them or 
>> accessing representations they give us. Locations give you physical, 
>> connected, access to a thing. If you go to a location to get something, if 
>> the thing is there you return with it physically in hand. A name might, but 
>> does not have to and usually does not give one any sort of physical, 
>> connected, access to the thing named by the location.
>> 
>> The word "identifier" is even more vague than name or location, and here 
>> the problem of the "identity" crisis appears: how do we know if the URI is 
>> being used for something as a name or as a location? The URI itself does 
>> not tell us. Even worse, what does "identify" mean, and how can we tell if 
>> two things identify the same thing? With representationsAI that is 
>> sometimes very clear, as in photographs, and sometimes not so clear, as in 
>> abstract art. Even the integers have problems with identification: does 
>> "11" identify eleven in decimal or three in binary? We won't know - and 
>> can't know unless we are given some sort of decoding scheme. In programming 
>> language tradition "identifier" has a pretty secure meaning and in that 
>> context the access/reference distinction is theoretically important but not 
>> of great practical significance, since everything you can refer to is 
>> physically accessible by the computer and has an address in memory. This is 
>> not true of logic, and definitely not true of model-theoretic semantics. 
>> Importantly, the access and reference distinction holds on the Web with 
>> many things that have URIs. In an information space, things may be 
>> identified without being accessed via a physical connection. In terms of 
>> the AWWW, a "non-information" resource is probably similar to the use of 
>> URI-as-access, while the use of URI for reference without access is called 
>> an "information resource."
>> 
>> Solving the Identity Crisis
>> 
>> Then there's the identity crisis: a single URI can actually play both roles 
>> (name with no access and location with access) at the same time, which 
>> gives us a powerful device for some application. The official view is that 
>> the representations are supposed to be interpreted by applications 
>> depending on MIME types is clearly focused on the use of a URI as a 
>> location for access; yet nothing forbids a URI that returns a 
>> representationREST or some other data to be used tell the web client that 
>> this URI is also a name for reference in addition to a location for access. 
>> In fact, for a URI used only as a name, MIME-types are clearly irrelevant. 
>> At least for the time being!
>> 
>> It would be useful to distinguish when a URI is used as "name" or as a 
>> "location, " and if some URIs can only be used as names or only as 
>> locations. In other words, this depends on whether the thing (which would 
>> be the "resource") identified by URI is on the Web or not. This already 
>> reduces to the "non-information resource" and "information resource" 
>> distinction on some level, and so is not a return to the historical Dark 
>> Ages of the Web. Since they share a common syntax, it does make sense to 
>> unite URLs and URNs on a level as URIs, and even to use URLs as "names." 
>> The identity crisis can be solved pretty easily, as shown by the Web Proper 
>> Names proposal. First, a separate URI scheme (wpn:// or tdb://) can 
>> distinguish the use of URI as names for reference from URI as locations for 
>> access. To capitalise even further on the identity crisis, this can be 
>> distinguished without a new URI scheme by solving it by the use of a 
>> representationREST, by having a type of representation format which says 
>> that this URI is a "name" as opposed to a "location." In fact, one could 
>> even have a special MIME-type to distinguish names for things: imagine the 
>> "name" MIME-type, or the "application/xhtml+xml+name" type.
>> 
>> The Future...
>> 
>> However, one subject which needs more exploration is the "interpretation" 
>> of URIs as names. How does one tell, if a URI as a name for reference, what 
>> its interpretation is? All the RDF statements that apply to that URI? And 
>> if so, how do we get them in a decentralized system? SPARQL? URIQA? Magic? 
>> In other words, assuming the URI gave you machine-readable descriptions in 
>> some Semantic Web language readable by machines, should the use of a 
>> URI-as-a-name really mean that this URI refers to (or denotes) whatever is 
>> necessary to satisfy the Semantic Web description? The Semantic Web allows 
>> one to build a number of roles and assertions, and one would assume that 
>> its interpretation is those other Semantic Web URIs that are satisfied by 
>> these roles and assertions. However, the SemWeb as it stands just has URIs 
>> as Semantic Web objects referring as names to other URIs as Semantic Web 
>> objects, and does not fulfill what the Semantic Web really needs: a way to 
>> move out of the Web and to the wide world beyond the Web. The Web needs to 
>> be integrated more into the world, and there lies the true holy grail of 
>> the Semantic Web. This is not just a problem for the Web, but the 
>> fundamental problem that proved to be the ultimate bane of AI. Indeed, it's 
>> easy to just attach a model theory to any formal system and say "We have 
>> semantics." Yes, that's strictly true - but let's not forget the adjective 
>> "model-theoretic." And models of the real world can be wrong, and often 
>> are. The real burden of the Semantic Web will lie on the ability of people 
>> and machines to produce models using SemWeb languages whose model-theoretic 
>> interpretations are relevant to the real world, and match them in 
>> interesting and useful ways that allow the Web to do things that are either 
>> impossible or very difficult on the current Web. Can people and machines do 
>> this in a large, dencentralized manner? Are the SemWeb standards sufficient 
>> for the task? Yet, while the answer to that question is unknown, the winds 
>> seem favorable.
>> 
>> 
>> 
>
>

-- 
 				--harry

 	Harry Halpin
 	Informatics, University of Edinburgh
         http://www.ibiblio.org/hhalpin
Received on Tuesday, 5 April 2005 05:01:49 UTC