Re: Terminology Question concerning Web Architecture and Linked Data from Pat Hayes on 2007-07-30 (semantic-web@w3.org from July 2007)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 30 Jul 2007 13:50:11 -0500
To: Harry Halpin <hhalpin@ibiblio.org>
Cc: Sandro Hawke <sandro@w3.org>, John Black <JohnBlack@kashori.com>, 'Linking Open Data' <linking-open-data@simile.mit.edu>, SW-forum <semantic-web@w3.org>, www-tag@w3.org
Message-Id: <p0623090cc2d3d8d89b5e@[10.100.0.67]>
>Pat Hayes wrote:
>[snip]
>>    But when you want to refer to something that cannot possibly be
>>  accessed (because it isn't the kind of thing that one can transmit
>>  HTTP protocols to: a book, say, or a galaxy, or a dead Roman emperor,
>>  or... well, just about anything, actually) then what is accessed, via
>>  a 303 redirect, is not the thing referred to (of course) but rather
>>  one of the kinds of thing that can be accessed, which should send you
>>  back some description of, or information about, the thing that the URI
>>  you started with is supposed to refer to. Got that?
>The *whole* point of the Web is the access relationship

Glad to hear you say that :-) Now, however, Im going to hold you to 
it. See below.

>, and is its
>major distinguishing characteristic over previous communication systems
>(like natural language, printing and T.V.) is twofold: it's ability to
>have a (fairly) decentralized yet universal space of names (URIs), and
>that these names can *access* more information.
>
>The Web could have two different types of names, one for reference (URN)
>and one for access (URLs), and that's been tried. And it was more or
>less not a failure

more or less a failure? (I presume you mean)
Was it a failure? But isn't that exactly how the Web in fact operates 
right now? We call them URIs or IRIs, but if it starts with http: or 
ftp: we all know, and the Web knows, its a URL. And if it starts urn: 
then its a URN. That IS the way the Web actually works, right?

>, precisely the whole advantage of the Web is that if
>one does not know what a name (URI) means

What a name >>means<<... hmmm. Lets put some flesh on that 'means' 
word. Are you talking about access or reference? You say above that 
the Web is all about access. But we DO know what it accesses, right? 
That's what we get back, or an error message telling us (usually) 
that, sorry, its broken and doesn't access anything. Either way, we 
get a clear answer, not one that needs any further information to 
disambiguate it.

So, maybe you are now slipping from access talk to reference talk. We 
may not know what the name REFERS to, indeed. That is, assuming that 
access and reference are distinct, since we do know what it accesses 
(see above). But where does the official TAG line make this 
distinction between access and reference, so that they can be 
distinguished? Aren't they both referred to ambiguously as 
'identifies'? One would have to say it doesn't identify what it 
identifies...

>, then one can access "more
>data" to help one disambiguate or discover what the referent is.

Is that really how the Web works? Even the Semantic Web? Can you cite 
an example? Most of the Web pages I see don't seem to be trying to 
help me disambiguate a name. Why don't we just say, gives more 
information? Then this covers even the normal case: a Web page 
returned for an http GET does indeed in a sense give me more 
information about the information resource that sent it, though 
that's not usually what Im most interested in.

>  Since
>the experiment of having two different types of names (URNs and URLs)
>failed

It didn't fail.

>, it makes some measure of sense to elide the distinction and have
>just one type of name - URIs - that has the access relationship.

Then why are you talking about reference and ambiguity, above?

BTW, I'm happy to have one kind of name (though in fact I don't think 
it would work, and it doesn't work that way now). My point is that we 
should acknowledge that there are two kinds of relationship.

>
>>  When a URI refers to something inaccessible, then what it eventually
>>  accesses will send you back not a 'representation' of the referent,
>>  but a >>description<< of the referent. (We can't say 'representation
>>  of', which would seem to be the rational thing to say, because what
>  > its a 'representation' of is, by TAG definition, the thing the URI
>>  eventually accesses, which has to be an HTTP endpoint of some kind.)
>Here's a problem - the 303 redirection trick basically uses the URI for
>the "inaccessible" resource as some sort of URN, and then allows you to
>follow-your-nose through the redirect to find out more information in
>order to pin down the reference.

Thats the idea, but I think its a fantasy.

>But then, a 303 redirection is *not
>necessarily* a sign that something is being used a name to refer to
>something outside the Web that can only be referenced.  It could be, but
>it could be just a plain old redirection.

True. And I agree this is one of the main problems with this idea. 
It's why I call it a hack.

>  One could imagine a number of
>ways besides going back to URNs to state that a URI is being used to
>primarily to refer rather than to access. One could have a new type of
>redirect, or even some sort of grapical "logo" on a web-page to say that
>that the URI is being used to refer to something rather than just web-page.

Or a new http code, meaning "here's some info that may be relevant, 
but this isn't the referent, because the actual referent isn't 
encodable in a byte stream". And this should be a 2xx , not a 3xx, 
because acts of reference always 'succeed' in any sense useful 
architecturally. In fact, looking at the existing codes, why not use 
203? Then the difference between receiving 200 plus <thingie> and 203 
plus <thingie> is that in the first case we can conclude that the URI 
refers to the source of thingie, while in the second case it does 
not, but is simply some information that should be relevant to 
helping figure out what the URI does refer to. But maybe 203 cannot 
reasonably be used in this way: then there needs to be a 207.

>  > Trying to distinguish these two cases is what has given rise to the
>>  distinction between 'information resource' and the other kind. The TAG
>>  documents try to do this in a theoretically satisfying way by talking
>>  about information that completely characterizes it, or some such. But
>>  there's a much simpler and more down-to-earth way to characterize the
>>  distinction. An information resource is anything that can act as an
>>  HTTP (or, if you want to be more general, some Web transfer protocol
>>  xxTP) endpoint, i.e. can respond appropriately to xxTP requests by
>>  emitting xxTP responses. A non-information resource is anything else.
>>  That's it: end of story.
>Isn't 303 a response? :)

Yes indeed. And what you actually access in this case (even pre-303) 
indeed ISNT the referent, right?

>Regardless, I think that's one reading, and a pretty sensible one.
>However, is the only distinguishing characteristic of a information
>resource is that it can respond to HTTP?

Or FTP or....

>A resource I think was
>originally defined not just a single representation, but the sum of all
>possible representations emitted over time

Right, which is why I said the thing accessed, not the response from it.

>  (and probably with various
>context, like cookies, as Sandro pointed out) taken into account.

True, Im sure a proper account would have to be more complicated than this.

>  So, an
>information resource is something that exists only as a set accessible
>representations through an HTTP endpoint given by a URI.

I don't think we want to identify the IR with the set of 
representations it can emit. As I follow the REST model, the resource 
is, abstractly, a function from times or access events to 
representations. Concretely, its a computational device that emits 
representations when prodded and is suitably connected to the 
Internet.

>Some people have removed the "HTTP endpoint" clause, and I think that's
>what causes the confusion over the "writing on the wall" example.
>
>Here's the problem - there's no standard way to know if a given resource
>is the sum of its representations you can access (i.e. an information
>resource), or if those representations are merely associated
>descriptions of something that one can only refer to (a non-information
>resource), which is being described by a HTTP endpoint but is *not the
>HTTP endpoint itself*.
>
>So one pragmatic solution is probably to take a holistic viewpoint and
>just say that if a URI is used to refer to something inaccessible (a
>non-information resource), it should clearly attempt to say so, and the
>onus is on the author  to provide associated descriptions to pin down
>what exactly is being referred to.

But here's the central dilemma, which is that its now become habitual 
to use a URI to actually denote what it accesses, even when that is a 
description that says that the URI does not denote what it accesses. 
But when that happens, what URI does one use for the accessed entity? 
Its like something that says "This isn't me" but has no way to tell 
you what 'this' is.

>
>Another pragmatic solution is to make say that the distinction really
>doesn't matter, and that - to steal a phrase from Ted Nelson - the Web
>and reality are increasingly "intertwingled" such taht  it's hard to say
>what's inaccessible on the Web versus what is not. One would normally
>think that a person's web-page is distinct from them, and that a
>web-page is accessible through the Web in the way a person isn't.

Yes, and one would be right :-)

>  Yet,
>looking at someone's Myspace account, it's amazing how  much of the
>person themselves is embodied in these representations

I really disagree.

>- and that very
>real friendships can exist primarily through these representations.

Or just by email. Yes. But the conclusion to draw is that friendship 
needs only a communication channel to flourish, not that websites are 
like people :-)

Pat

>
>Harry Halpin,  University of Edinburgh
>http://www.ibiblio.org/hhalpin 6B522426


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 30 July 2007 18:50:47 UTC