Re: Terminology Question concerning Web Architecture and Linked Data from Harry Halpin on 2007-07-28 (www-tag@w3.org from July 2007)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Sat, 28 Jul 2007 11:43:05 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: Sandro Hawke <sandro@w3.org>, John Black <JohnBlack@kashori.com>, 'Linking Open Data' <linking-open-data@simile.mit.edu>, SW-forum <semantic-web@w3.org>, www-tag@w3.org
Message-ID: <46AB6409.4020906@ibiblio.org>
Pat Hayes wrote:
[snip]
>   But when you want to refer to something that cannot possibly be
> accessed (because it isn't the kind of thing that one can transmit
> HTTP protocols to: a book, say, or a galaxy, or a dead Roman emperor,
> or... well, just about anything, actually) then what is accessed, via
> a 303 redirect, is not the thing referred to (of course) but rather
> one of the kinds of thing that can be accessed, which should send you
> back some description of, or information about, the thing that the URI
> you started with is supposed to refer to. Got that?
The *whole* point of the Web is the access relationship, and is its
major distinguishing characteristic over previous communication systems
(like natural language, printing and T.V.) is twofold: it's ability to
have a (fairly) decentralized yet universal space of names (URIs), and
that these names can *access* more information.

The Web could have two different types of names, one for reference (URN)
and one for access (URLs), and that's been tried. And it was more or
less not a failure, precisely the whole advantage of the Web is that if
one does not know what a name (URI) means, then one can access "more
data" to help one disambiguate or discover what the referent is. Since
the experiment of having two different types of names (URNs and URLs)
failed, it makes some measure of sense to elide the distinction and have
just one type of name - URIs - that has the access relationship.

> When a URI refers to something inaccessible, then what it eventually
> accesses will send you back not a 'representation' of the referent,
> but a >>description<< of the referent. (We can't say 'representation
> of', which would seem to be the rational thing to say, because what
> its a 'representation' of is, by TAG definition, the thing the URI
> eventually accesses, which has to be an HTTP endpoint of some kind.)
Here's a problem - the 303 redirection trick basically uses the URI for
the "inaccessible" resource as some sort of URN, and then allows you to
follow-your-nose through the redirect to find out more information in
order to pin down the reference. But then, a 303 redirection is *not
necessarily* a sign that something is being used a name to refer to
something outside the Web that can only be referenced.  It could be, but
it could be just a plain old redirection. One could imagine a number of
ways besides going back to URNs to state that a URI is being used to
primarily to refer rather than to access. One could have a new type of
redirect, or even some sort of grapical "logo" on a web-page to say that
that the URI is being used to refer to something rather than just web-page.
> Trying to distinguish these two cases is what has given rise to the
> distinction between 'information resource' and the other kind. The TAG
> documents try to do this in a theoretically satisfying way by talking
> about information that completely characterizes it, or some such. But
> there's a much simpler and more down-to-earth way to characterize the
> distinction. An information resource is anything that can act as an
> HTTP (or, if you want to be more general, some Web transfer protocol
> xxTP) endpoint, i.e. can respond appropriately to xxTP requests by
> emitting xxTP responses. A non-information resource is anything else.
> That's it: end of story.
Isn't 303 a response? :)

Regardless, I think that's one reading, and a pretty sensible one.
However, is the only distinguishing characteristic of a information
resource is that it can respond to HTTP? A resource I think was
originally defined not just a single representation, but the sum of all
possible representations emitted over time (and probably with various
context, like cookies, as Sandro pointed out) taken into account. So, an
information resource is something that exists only as a set accessible
representations through an HTTP endpoint given by a URI.

Some people have removed the "HTTP endpoint" clause, and I think that's
what causes the confusion over the "writing on the wall" example.

Here's the problem - there's no standard way to know if a given resource
is the sum of its representations you can access (i.e. an information
resource), or if those representations are merely associated
descriptions of something that one can only refer to (a non-information
resource), which is being described by a HTTP endpoint but is *not the
HTTP endpoint itself*.

So one pragmatic solution is probably to take a holistic viewpoint and
just say that if a URI is used to refer to something inaccessible (a
non-information resource), it should clearly attempt to say so, and the
onus is on the author  to provide associated descriptions to pin down
what exactly is being referred to.

Another pragmatic solution is to make say that the distinction really
doesn't matter, and that - to steal a phrase from Ted Nelson - the Web
and reality are increasingly "intertwingled" such taht  it's hard to say
what's inaccessible on the Web versus what is not. One would normally
think that a person's web-page is distinct from them, and that a
web-page is accessible through the Web in the way a person isn't. Yet,
looking at someone's Myspace account, it's amazing how  much of the
person themselves is embodied in these representations - and that very
real friendships can exist primarily through these representations. So,
maybe while the person is somewhat inaccessible through the Web, they
are not entirely inaccessible.




> Note that this is an architectural kind of criterion, not a
> semantic/information-theoretic kind. I doubt if it is really possible
> to make the distinction in other than architectural terms. But in any
> case, this is a hell of a lot simpler (and I suggest, more accurate)
> than the way the TAG currently tries to do it. And it makes it clear
> why writing on a wall isn't an information resource but the same
> writing on a Web server (not a Web page) is: because the wall can't
> respond to HTTP GET and the server can.
Agreed.
> Pat


-- 
		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Saturday, 28 July 2007 15:43:49 UTC