- From: Pat Hayes <phayes@ihmc.us>
- Date: Fri, 27 Jul 2007 14:04:45 -0500
- To: "Sandro Hawke" <sandro@w3.org>
- Cc: "John Black" <JohnBlack@kashori.com>, "'Linking Open Data'" <linking-open-data@simile.mit.edu>, "SW-forum" <semantic-web@w3.org>, www-tag@w3.org
> > And that makes is far less crisp, I'm afraid. > >Yes, sad, isn't it? For a minute there, it did seem nicely crisp. > >> Sorry, but I still find this incomprehensible, that the text of a book is >> not an information resource, but if that same text is encoded in a computer >> file on a web server, it now somehow becomes an information resource. Or >> what I found just as surprising was when Tim said, "...a literal string is >> not an information resource." [2] about a text string in a static web page >> served by a web server, here http://kashori.com/ontology/MyURI. Honestly, >> I'm not debating here, I just don't get it. > >I hear you. Sometimes I think I can kind of make sense of it; sometimes >not. It's hard to come up with a simple and coherent model for a very >messy and complex system.... (the real, deployed web is complex >enough -- try to add the Semantic Web...?) Oh, come on, the problem isn't one of complexity. Reference isn't a complicated idea. The problem here is simply being in a conceptual muddle. Try this for size. Until recently, the Web (that's the "pre-semantic" Web, mostly constructed from HTML and HTTP with a dash of Flash and Javascript added) was architecturally complicated but conceptually straightforward. What URIs did was go and locate some thing which was physically attached to the Internet and which, when suitably prodded, sent back some information to you. These things were called 'resources' and the information they sent back was called a 'representation' of the resource (at the time it was prodded, if one is being picky). For more details, see the REST architectural model. Now, one can describe all of this without talking about names referring to things at all: it all has to do with moving chunks of information around the internet. And indeed, that is how hypertext was originally described by for example Engelbart and other pioneers, and how the Web was described in the early days of the W3C. However, at some point in the mid-90s, I'm not sure exactly when, a very bad idea seems to have germinated in the collective mind of the TAG, that the relationship of URIs to resources is like the traditional relation of reference or denotation that is said to hold between a name and the thing it is a name of. The terminology of "universal identifiers" "identifying" resources was introduced (remember the apparently innocuous change from URL/URN to URI? See how Locating and Naming have been smurged into one idea?). Notice how 'identifies' can be understood either in the sense of a name referring to something OR in the sense of an identifier in a program providing access to a piece of information, or perhaps more generally to a computational entity which is a source of information. This critical ambiguity between reference and access seemed harmless at the time it was introduced, because in fact the 'reference' half of it was purely a theoretician's fantasy and had no operational consequences. (Roy for example is quite clear that reference and meaning play no role in his architectural REST model, and he is quite correct.) Making this blurring between naming and access may have provided a warm fuzzy sense of creating a new Theory of Web Semiotics by uniting two ideas into one, but in fact it was simply a muddle of two ideas that should have been kept distinct, because they have fundamentally different properties. And that muddle has now, with the arrival of the semantic Web, come back to haunt us: because on the SWeb, reference is no longer just a theoretical gloss on access, but has become part of the actual Web machinery. And now, the fact that reference is not the same as access starts to hurt. Since the TAG has been using the terminology of "identify" to be systematically ambiguous between access and reference, it has no way to even talk about the case where these two relationships diverge. I suspect that it has no way to even *think* about it, in fact. It is rather hard to backtrack through almost a decade of use of a term like "identify", even if it was mis-use, so rather than getting the foundations straight, the TAG has decided to insist that when you can access a resource, then access and reference will so be identical, thereby preserving the valuable ambiguity of "identifies". But when you want to refer to something that cannot possibly be accessed (because it isn't the kind of thing that one can transmit HTTP protocols to: a book, say, or a galaxy, or a dead Roman emperor, or... well, just about anything, actually) then what is accessed, via a 303 redirect, is not the thing referred to (of course) but rather one of the kinds of thing that can be accessed, which should send you back some description of, or information about, the thing that the URI you started with is supposed to refer to. Got that? When a URI refers to something inaccessible, then what it eventually accesses will send you back not a 'representation' of the referent, but a >>description<< of the referent. (We can't say 'representation of', which would seem to be the rational thing to say, because what its a 'representation' of is, by TAG definition, the thing the URI eventually accesses, which has to be an HTTP endpoint of some kind.) Trying to distinguish these two cases is what has given rise to the distinction between 'information resource' and the other kind. The TAG documents try to do this in a theoretically satisfying way by talking about information that completely characterizes it, or some such. But there's a much simpler and more down-to-earth way to characterize the distinction. An information resource is anything that can act as an HTTP (or, if you want to be more general, some Web transfer protocol xxTP) endpoint, i.e. can respond appropriately to xxTP requests by emitting xxTP responses. A non-information resource is anything else. That's it: end of story. Note that this is an architectural kind of criterion, not a semantic/information-theoretic kind. I doubt if it is really possible to make the distinction in other than architectural terms. But in any case, this is a hell of a lot simpler (and I suggest, more accurate) than the way the TAG currently tries to do it. And it makes it clear why writing on a wall isn't an information resource but the same writing on a Web server (not a Web page) is: because the wall can't respond to HTTP GET and the server can. Pat > >> > So, in this metaphor, a URI is something you hand a guide, and the guide >> > will show you the relevant spot on a wall. If you give a URI for a >> > non-IR, then the best the guide can do is show you a spot on the wall >> > which talks about that non-IR. (That is, it can do a 303.) >> >> When used by an agent in the context of the semantic web, that URI is a >> name, used to refer to a resource. Either I know what that agent is >> referring to by that URI or I don't. If I know what the agent denotes by >> that name (URI), then I don't need to be taken to the wall at all. Or if I >> don't know what resource the agent refers to by that URI, then it won't help >> to be put in front of a stream of representations, because the >> representations are not the resource, and unless you know the nature of the >> resource, you can't know whether the received representations reveal that >> nature or not. And this is true both in the case of information or >> non-information resources, because the essence of neither can be transmitted >> over a network, as I argue above. In either case, information or >> non-information, what I really want to hear is a definite description of the >> resource and to be told that other agents do associate that description with >> the name (URI) used. > >Here, when we get close to what's actually happening, and can just talk >about Semantic Web / Linked Data use cases, I disagree with you. > >Specifically (repeating) : >> Either I know what that agent is referring to by that URI or I don't. > >That's human-talk, not machine talk. Machine don't know what things >are; they just know logical statements about them. [I happen to believe >that's true for humans as well, but we don't need to go there.] So the >question isn't whether I know what an agent is referring to but what >logical statements I have (and believe) about the thing being referred >to. > >The ability to go to the wall is the ability to (maybe) find out more >information (statements), probably about that thing, and things it's >related to. > >> In either case, information or >> non-information, what I really want to hear is a definite description of the >> resource and to be told that other agents do associate that description with >> the name (URI) used. > >I'm not sure what a "definitive" description is. All you get is some >statements. They may be "definitive" in the sense that they were chosen >by the person who allocated the name. I'm not sure that's very >definitive.... > >As for the association between URIs and retrieved content, you find that >out by doing a web retrieval.... > >> By the way, in the current scheme, where am I supposed to go for a good >> description of, rather than the direct experience of, an information >> resource? > >There is no way to do that, with the current web. All you can do is go >there and hope it tells you about itself. Lots of human-readable >websites do. Some RDF graphs do. > > -- Sandro > >> > Along this more sophisticated model, one of my prefered terms (instead > > > of Information Resource) was "Response Point". But this is all >> > pretty darn fuzzy, and a hard subject on which to reach consensus. >> > >> > * * * >> > >> > Really, I think should probably just call them "web pages". (I know >> > some people have some ideas about Information Resources which are not >> > Web Pages. I'm not convinced.) >> > >> > So: >> > >> > Information Resource == Web Page. >> > Non-Information-Resource == Anything that's not a Web Page. >> > >> > (And while we're at it, call then "Web Addresses" not "URIs".) >> > >> > So, one of the funky Semantic Web ideas is to give Web Addresses (or >> > Pseudo-Web-Addresses) to things which are *not* Web Pages. Huh? This >> > sounds a little weird, especially if you try to call them real Web >> > Addresses, but via some tricks it kind of works. It lets you talk about >> > things in a way where the listener can find out more information if they >> > want it. >> > >> > Humans are getting used to this with Google. If I hear a term I don't >> > understand, I can often Google it faster than I can ask the speaker to >> > explain it. Especially if it's in a written document. (Of course, >> > Google just makes it faster and easier -- it's always been possible to >> > do research.) Using URIs (pseudo-web-addresses) instead of search terms >> > has some advantages and some disadvantages; I think it's a good plan, >> > myself. >> > >> > -- Sandro >> > >> 1. http://lists.w3.org/Archives/Public/www-tag/2007Jul/0112.html >> 2. http://lists.w3.org/Archives/Public/semantic-web/2007Jun/0265.html >> >> John >> www.kashori.com >> >> -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Friday, 27 July 2007 19:05:12 UTC