RE: "information resource" from Harry Halpin on 2004-09-25 (www-tag@w3.org from September 2004)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Sat, 25 Sep 2004 16:32:37 -0400 (EDT)
To: www-tag@w3.org
Message-ID: <Pine.LNX.4.44.0409251533550.20584-100000@tribal.metalab.unc.edu>
Len et al.,
	First, let me preface this by saying that the change from
"information resource" to "Web resource" (i.e. Web accessible) is
grand and I support it. But for those with interest in what 
we mean when we say "information", keep reading, although be forewarned 
it's a bit long. "Information" is not a meaningless word, and can
have tests if you have sufficiently clear conditions. 

	So, one test of information space is that it actually conveys
*something* to someone. If the Web was a bunch of machines randomly
sending bits around with no reference to anything, then it would *not*
be an information space. Because the Web, both in RDF  statements and 
plain old web-pages, conveys lots of information about things to people 
(and machines, depending on your take on them), it is an information 
space. However, that is so obvious it may not need to said in TAG :) I 
explain below for those truly interested.	
	While the term "information" per se does not have a universal
definition, there are definitely "theories of information" that have 
tests. For example, the famous Shannon communication theory that points
our that something carries no information (fully entropic as you put it)
does not allow you to discriminate from among possibilites. Thus, if
8 employees of the W3C were to put their name in a hat to pick a new 
editor for TAG (call this situation s, for sender) and draw one have Norm 
deliver it to TimBL at his office on a piece of paper (situation r, for 
receiver), this would convey 3 bits of information about situation s to 
situation r, since it reduces 8 possibilites to 1  (log_2(8)=3). However, 
if Norm lost the name, and then wrote DanBri on it off the top of his 
head, this would contain 0 bits of information about situation s to 
situation r, since *nothing* about the state of affairs at situation s 
is conveyed. So, as regards communicating from s to r (not any other
points, such as Norm's imagination!), the note fails the test of having
any quantitative information about s.
   However, you may notice this theory of information tells you *nothing*
about whether Norm, DanBri, DanC, or DaveRaggett were actually on the
note. Any one of their names successfully communicated from s to r has
the same 3 bits of information. If you want to know *which* one (the 
content of the note) you need not a theory of information communication, 
but a semantic (as in philosophy, not semantic web) theory of information, of
which there have been several versions, and here I follow the version of
Fred Dretske in his book "Knowledge and the Flow of Information". Read
it for a much better explanation and justification.  
	The note contains information if it passes this test: 
(given you have a something (call it F) that you want to convey is)
A signal r caries information that s is F if the conditional probability 
of s being F, given r and k (k representing the internal knowlege of the 
recipeint r about s), is 1 (but given k alone), is less than 1.
	Sounds complicated and perhaps foolish, but it's rather simple in 
practice. The note Norm has (with DanBri's name on it from r) delivers to 
TimBL the conditional probability that the employee selected at r was 
DanBri with a probability of 1. It conveys the note is *about* DanBri, not DanC 
or anyone else. So, if Norm had written another name after losing the 
first note, even if that name was DanBri, it would convey *no* information
about s to r since while DanBri's name is written, that name on the newly 
forged note is not actually from s. This avoids a problem that Shannon has 
- you know that if I tell you I live on Forest Rd. in Edinburgh, Scotland  
I convey *more* information than if I just tell you I live in Edinburgh, 
Scotland. It's impossible to quantify how many more bits the first 
statement conveys than the second statement, but the first statement clearly conveys
more information. It conveys more information because it conveys both
propositions F' (living on Forst Rd) and F''(living in Edinburgh Scotland)
with a probability of 1 - allowing no other alternatives. 
	However, you'll note this definition of information is
highly dependent on thing such as there being a clear situation s and r,
as well as statements F and probability k (and some notion of information 
being true and corpuscular). These things are all very tricky in the real 
world, and perhaps even more so in the Web world. But the relevance is 
that a web-page about the Eiffel Tower that gives me a few paragraphs
*about* the Eiffel tower has more information for me than one that just 
lists the Eiffel Tower in a list.  So, a resource as denoted 
by a URI string, via the oft-cited principle of URI opacity, 
should convey no information about the represenations retrieved by the 
resource, while something that has a representation does convey usually 
information. In fact, if it doesn't (such as a string of random bits) then 
it probably isn't a representation in the common (not TAG!) usage of the 
word. I think, while the word "information" is up in the air a bit, it can 
be used sensibly. I'm not sure if "information resource" really is a 
better word than "Web resource", and I find "Web resource" more clear. If 
any information resource wasn't accessible via the Web, it couldn't convey 
information over the Web, and so shouldn't probably be part of the TAG. 
The question is then are there things on the Web that do not convey 
information to anyone about anything? Hmmm...requires more thought, but 
probably not? 

	Any comments should probably be e-mailed to me directly unless
you really think they are relevant for the whole www-tag group. However,
I did think a few people here might be interested before we delete
the word information from all appearances in all W3C documents.

					-harry

On Fri, 24 Sep 2004, Bullard, Claude L (Len) wrote:

> 
> From: Chris Lilley [mailto:chris@w3.org]
> 
> BCLL> Try to conceive of a test for 'information space'.
> 
> >Well, one can, with the drawback that everything tested passes. So its
> >not usefully testable.
> 
> Correct.  And when a term does not discriminate, it conveys no 
> information (fully entropic).  The term is a placeholder, the 
> outermost brace, or the set of sets.  You could say it is 
> the *resource set*, a zero-based origin.  Note I did not 
> say it is the *web resource set*.   
> 
> So "information space" is colloquially useful, but not computable 
> as an ontological member (if I use it as a theory, it does not 
> identify a resource itself).
> 
> len
> 
>
Received on Saturday, 25 September 2004 20:32:38 UTC