Re: New draft TAG Finding on The Self-Describing Web from Dan Brickley on 2007-05-30 (www-tag@w3.org from May 2007)

From: Dan Brickley <danbri@danbri.org>
Date: Wed, 30 May 2007 15:42:37 +0100
To: Jonathan Rees <jonathan.rees@gmail.com>
CC: www-tag@w3.org
Message-ID: <465D8D5D.3070200@danbri.org>
Jonathan Rees wrote:
> 
> I'm sorry, I'm new to the list and haven't had time to review the
> archives. I know these topics have been pounded to death, so please
> just point me to previous threads if they address my points. I'm also
> writing hastily since I know y'all are talking about this soon.
> 
> - You (and others) say: "Information resources are resources,
> identified by URIs and whose essential characteristics can be conveyed
> in a message [AWWW]." This is not an operational definition; I have no
> idea how to consider some resource, apply this criterion, and
> determine whether or not it is an information resource. For example,
> what message, if any, conveys the essential characteristics of the
> resource denoted by http://news.google.com/ ? Surely today's news has
> little bearing on the essence of this resource. I, at least, would
> have said that the resource is something whose essence is to give the
> moment's news at every moment. The message you get from an HTTP GET is
> just a sampling of complicated variable, not the variable itself.
>
 >
> Well, you haven't stated any relationship between the postulated
> message conveying the resource's essence and the messages we get when
> we dereference its URI; that might allow a loophole of some kind. But
> I don't think you intend to separate those two.

I guess the idea is of document-like-things that are (in principle at 
least) losslessly (or 'essentially losslessly', whatever that means) 
representable as a stream of 0s and 1s.

I have long been extremely uncomfortable with the idea of "Information 
Resources" being baked into webarch at any deep level, since there are 
so many ways of carving this cake. Is Shakespeare's Hamlet an 
InfoResource? the Bible, King James edition? My homepage? Your 
audio-blog exposed by http/atom podcast, by Skype answerphone? by analog 
telephone answerphone? by booming loudspeaker? A zero-byte file? Another 
zero-byte file created by someone else? This book on my desk? A human 
gene identified by LSID URI? This piece of blank paper on my desk? The 
song "Happy Birthday"? These kinds of distinction are made differently 
in a variety of ontologies and traditions, eg. FRBR in the Library 
world, CIDOC-CRM in the museum informatics scene. The word "essential" 
is so easy to type, but has been typing philosophers in knots for 
centuries. I hope we can find some way to sidestep its use.

In the case of http://news.google.com/ the "essential characteristics" 
line of thought could tackle it in two ways, ... either that the info 
resource is the larger thing, the Google News database, of which today's 
HTTP-delivered HTML rendering is but a partial view (but which could in 
theory be entirely serialized in 0s and 1s). Or we could say that 
http://news.google.com/ is an info resource that is a smallish document 
whose primary topic is today's news (and which might have concrete 
renderings in HTML, PDF, WML, RSS etc data formats), and which is 
intimately related to a bigger database and Web service which generate 
that document, but which are identified by other URIs. Same thing with 
"today's temperature" services and the Web pages they generate. The 
problem with having the info-resource be the larger thing, ... is 
drawing the line and saying "here is where the core info resource stops 
... and here are the bits that are its 'sensors' for interacting with 
the world (eg. scraping news, taking room temperature etc)". The idea of 
info-resource makes it tricky 'cos you have to avoid calling physical 
things (books, thermometers etc) info resources. Whereas if you make the 
smaller thing the info resource, ie. the document -about- today's news 
or today's temperature ... you can be accused of confusing the message 
body with the deeper thing it is merely a view of.

> Anyhow, Google is the URI owner and gets to decide what the URI
> denotes; so who are we to be talking about the essences of Google's
> resources? 

..ooOO("When I use a word," Humpty Dumpty said, in a rather scornful 
tone, "it means just what I choose it to mean - neither more nor less." :)

> If we know independently what a URI denotes, and have an
> objective definition of "information resource", then we can take
> stands on the information-resourceness of the denoted resource.
> Otherwise it's an exercise in futility, and instead we should just be
> talking empirically about URI's and HTTP experiences.

Yes please.


> - I think httpRange-14 doesn't really mean to say that the fact of a
> 200 response implies that the resource is an information resource;
> after all, assuming that "information resource" has some ontological
> legitimacy, servers can be wrong, inconsistent, or deceptive (consider
> the HTTP response you get by dereferencing
> http://www.ihmc.us/users/phayes/PatHayes.html, which directly
> contradicts httpRange-14). I think the intent is that a 200
> constitutes an *assertion* that the resource is an information
> resource. The shift from implication to assertion allows that Pat can
> be right while his server is wrong.

Yep, this assertion-based characterisation sounds a lot more robust and 
worldy.

> Jonathan

Dan
Received on Wednesday, 30 May 2007 14:43:04 UTC