Re: Terminology Question concerning Web Architecture and Linked Data from Pat Hayes on 2007-07-27 (semantic-web@w3.org from July 2007)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 27 Jul 2007 14:04:45 -0500
To: "Sandro Hawke" <sandro@w3.org>
Cc: "John Black" <JohnBlack@kashori.com>, "'Linking Open Data'" <linking-open-data@simile.mit.edu>, "SW-forum" <semantic-web@w3.org>, www-tag@w3.org
Message-Id: <p06230901c2cfe740b400@[10.100.0.67]>
>  > And that makes is far less crisp, I'm afraid.
>
>Yes, sad, isn't it?    For a minute there, it did seem nicely crisp.
>
>>  Sorry, but I still find this incomprehensible, that the text of a book is
>>  not an information resource, but if that same text is encoded in a computer
>>  file on a web server, it now somehow becomes an information resource. Or
>>  what I found just as surprising was when Tim said, "...a literal string is
>>  not an information resource." [2] about a text string in a static web page
>>  served by a web server, here http://kashori.com/ontology/MyURI. Honestly,
>>  I'm not debating here, I just don't get it.
>
>I hear you.   Sometimes I think I can kind of make sense of it; sometimes
>not.  It's hard to come up with a simple and coherent model for a very
>messy and complex system....    (the real, deployed web is complex
>enough -- try to add the Semantic Web...?)


Oh, come on, the problem isn't one of complexity. Reference isn't a 
complicated idea. The problem here is simply being in a conceptual 
muddle.

Try this for size.

Until recently, the Web (that's the "pre-semantic" Web, mostly 
constructed from HTML and HTTP with a dash of Flash and Javascript 
added) was architecturally complicated but conceptually 
straightforward. What URIs did was go and locate some thing which was 
physically attached to the Internet and which, when suitably prodded, 
sent back some information to you. These things were called 
'resources' and the information they sent back was called a 
'representation' of the resource (at the time it was prodded, if one 
is being picky). For more details, see the REST architectural model.

Now, one can describe all of this without talking about names 
referring to things at all: it all has to do with moving chunks of 
information around the internet. And indeed, that is how hypertext 
was originally described by for example Engelbart and other pioneers, 
and how the Web was described in the early days of the W3C. However, 
at some point in the mid-90s, I'm not sure exactly when, a very bad 
idea seems to have germinated in the collective mind of the TAG, that 
the relationship of URIs to resources is like the traditional 
relation of reference or denotation that is said to hold between a 
name and the thing it is a name of. The terminology of "universal 
identifiers" "identifying" resources was introduced (remember the 
apparently innocuous change from URL/URN to URI? See how Locating and 
Naming have been smurged into one idea?). Notice how 'identifies' can 
be understood either in the sense of a name referring to something OR 
in the sense of an identifier in a program providing access to a 
piece of information, or perhaps more generally to a computational 
entity which is a source of information. This critical ambiguity 
between reference and access seemed harmless at the time it was 
introduced, because in fact the 'reference' half of it was purely a 
theoretician's fantasy and had no operational consequences. (Roy for 
example is quite clear that reference and meaning play no role in his 
architectural REST model, and he is quite correct.) Making this 
blurring between naming and access may have provided a warm fuzzy 
sense of creating a new Theory of Web Semiotics by uniting two ideas 
into one, but in fact it was simply a muddle of two ideas that should 
have been kept distinct, because they have fundamentally different 
properties. And that muddle has now, with the arrival of the semantic 
Web, come back to haunt us: because on the SWeb, reference is no 
longer just a theoretical gloss on access, but has become part of the 
actual Web machinery. And now, the fact that reference is not the 
same as access starts to hurt.

Since the TAG has been using the terminology of "identify" to be 
systematically ambiguous between access and reference, it has no way 
to even talk about the case where these two relationships diverge. I 
suspect that it has no way to even *think* about it, in fact. It is 
rather hard to backtrack through almost a decade of use of a term 
like "identify", even if it was mis-use, so rather than getting the 
foundations straight, the TAG has decided to insist that when you can 
access a resource, then access and reference will so be identical, 
thereby preserving the valuable ambiguity of "identifies". But when 
you want to refer to something that cannot possibly be accessed 
(because it isn't the kind of thing that one can transmit HTTP 
protocols to: a book, say, or a galaxy, or a dead Roman emperor, 
or... well, just about anything, actually) then what is accessed, via 
a 303 redirect, is not the thing referred to (of course) but rather 
one of the kinds of thing that can be accessed, which should send you 
back some description of, or information about, the thing that the 
URI you started with is supposed to refer to. Got that? When a URI 
refers to something inaccessible, then what it eventually accesses 
will send you back not a 'representation' of the referent, but 
a >>description<< of the referent. (We can't say 'representation of', 
which would seem to be the rational thing to say, because what its a 
'representation' of is, by TAG definition, the thing the URI 
eventually accesses, which has to be an HTTP endpoint of some kind.)

Trying to distinguish these two cases is what has given rise to the 
distinction between 'information resource' and the other kind. The 
TAG documents try to do this in a theoretically satisfying way by 
talking about information that completely characterizes it, or some 
such. But there's a much simpler and more down-to-earth way to 
characterize the distinction. An information resource is anything 
that can act as an HTTP (or, if you want to be more general, some Web 
transfer protocol xxTP) endpoint, i.e. can respond appropriately to 
xxTP requests by emitting xxTP responses. A non-information resource 
is anything else. That's it: end of story.

Note that this is an architectural kind of criterion, not a 
semantic/information-theoretic kind. I doubt if it is really possible 
to make the distinction in other than architectural terms. But in any 
case, this is a hell of a lot simpler (and I suggest, more accurate) 
than the way the TAG currently tries to do it. And it makes it clear 
why writing on a wall isn't an information resource but the same 
writing on a Web server (not a Web page) is: because the wall can't 
respond to HTTP GET and the server can.

Pat



>
>>  > So, in this metaphor, a URI is something you hand a guide, and the guide
>>  > will show you the relevant spot on a wall.   If you give a URI for a
>>  > non-IR, then the best the guide can do is show you a spot on the wall
>>  > which talks about that non-IR.  (That is, it can do a 303.)
>>
>>  When used by an agent in the context of the semantic web, that URI is a
>>  name, used to refer to a resource. Either I know what that agent is
>>  referring to by that URI or I don't. If I know what the agent denotes by
>>  that name (URI), then I don't need to be taken to the wall at all. Or if I
>>  don't know what resource the agent refers to by that URI, then it won't help
>>  to be put in front of a stream of representations, because the
>>  representations are not the resource, and unless you know the nature of the
>>  resource, you can't know whether the received representations reveal that
>>  nature or not. And this is true both in the case of information or
>>  non-information resources, because the essence of neither can be transmitted
>>  over a network, as I argue above. In either case, information or
>>  non-information, what I really want to hear is a definite description of the
>>  resource and to be told that other agents do associate that description with
>>  the name (URI) used.
>
>Here, when we get close to what's actually happening, and can just talk
>about Semantic Web / Linked Data use cases, I disagree with you.
>
>Specifically (repeating) :
>>  Either I know what that agent is referring to by that URI or I don't.
>
>That's human-talk, not machine talk.  Machine don't know what things
>are; they just know logical statements about them.  [I happen to believe
>that's true for humans as well, but we don't need to go there.]  So the
>question isn't whether I know what an agent is referring to but what
>logical statements I have (and believe) about the thing being referred
>to.
>
>The ability to go to the wall is the ability to (maybe) find out more
>information (statements), probably about that thing, and things it's
>related to.
>
>>  In either case, information or
>>  non-information, what I really want to hear is a definite description of the
>>  resource and to be told that other agents do associate that description with
>>  the name (URI) used.
>
>I'm not sure what a "definitive" description is.  All you get is some
>statements.  They may be "definitive" in the sense that they were chosen
>by the person who allocated the name.  I'm not sure that's very
>definitive....
>
>As for the association between URIs and retrieved content, you find that
>out by doing a web retrieval....
>
>>  By the way, in the current scheme, where am I supposed to go for a good
>>  description of, rather than the direct experience of, an information
>>  resource?
>
>There is no way to do that, with the current web.  All you can do is go
>there and hope it tells you about itself.  Lots of human-readable
>websites do.  Some RDF graphs do.
>
>    -- Sandro
>
>>  > Along this more sophisticated model, one of my prefered terms (instead
>  > > of Information Resource) was "Response Point".     But this is all
>>  > pretty darn fuzzy, and a hard subject on which to reach consensus.
>>  >
>>  >   *        *         *
>>  >
>>  > Really, I think should probably just call them "web pages".   (I know
>>  > some people have some ideas about Information Resources which are not
>>  > Web Pages.  I'm not convinced.)
>>  >
>>  > So:
>>  >
>>  >        Information Resource == Web Page.
>>  >        Non-Information-Resource == Anything that's not a Web Page.
>>  >
>>  > (And while we're at it, call then "Web Addresses" not "URIs".)
>>  >
>>  > So, one of the funky Semantic Web ideas is to give Web Addresses (or
>>  > Pseudo-Web-Addresses) to things which are *not* Web Pages.  Huh?  This
>>  > sounds a little weird, especially if you try to call them real Web
>>  > Addresses, but via some tricks it kind of works.  It lets you talk about
>>  > things in a way where the listener can find out more information if they
>>  > want it.
>>  >
>>  > Humans are getting used to this with Google.  If I hear a term I don't
>>  > understand, I can often Google it faster than I can ask the speaker to
>>  > explain it.  Especially if it's in a written document.  (Of course,
>>  > Google just makes it faster and easier -- it's always been possible to
>>  > do research.)  Using URIs (pseudo-web-addresses) instead of search terms
>>  > has some advantages and some disadvantages; I think it's a good plan,
>>  > myself.
>>  >
>>  >    -- Sandro
>>  >
>>  1. http://lists.w3.org/Archives/Public/www-tag/2007Jul/0112.html
>>  2. http://lists.w3.org/Archives/Public/semantic-web/2007Jun/0265.html
>>
>>  John
>>  www.kashori.com
>>
>>


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 27 July 2007 19:05:12 UTC