- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Sat, 25 Sep 2004 16:32:37 -0400 (EDT)
- To: www-tag@w3.org
Len et al., First, let me preface this by saying that the change from "information resource" to "Web resource" (i.e. Web accessible) is grand and I support it. But for those with interest in what we mean when we say "information", keep reading, although be forewarned it's a bit long. "Information" is not a meaningless word, and can have tests if you have sufficiently clear conditions. So, one test of information space is that it actually conveys *something* to someone. If the Web was a bunch of machines randomly sending bits around with no reference to anything, then it would *not* be an information space. Because the Web, both in RDF statements and plain old web-pages, conveys lots of information about things to people (and machines, depending on your take on them), it is an information space. However, that is so obvious it may not need to said in TAG :) I explain below for those truly interested. While the term "information" per se does not have a universal definition, there are definitely "theories of information" that have tests. For example, the famous Shannon communication theory that points our that something carries no information (fully entropic as you put it) does not allow you to discriminate from among possibilites. Thus, if 8 employees of the W3C were to put their name in a hat to pick a new editor for TAG (call this situation s, for sender) and draw one have Norm deliver it to TimBL at his office on a piece of paper (situation r, for receiver), this would convey 3 bits of information about situation s to situation r, since it reduces 8 possibilites to 1 (log_2(8)=3). However, if Norm lost the name, and then wrote DanBri on it off the top of his head, this would contain 0 bits of information about situation s to situation r, since *nothing* about the state of affairs at situation s is conveyed. So, as regards communicating from s to r (not any other points, such as Norm's imagination!), the note fails the test of having any quantitative information about s. However, you may notice this theory of information tells you *nothing* about whether Norm, DanBri, DanC, or DaveRaggett were actually on the note. Any one of their names successfully communicated from s to r has the same 3 bits of information. If you want to know *which* one (the content of the note) you need not a theory of information communication, but a semantic (as in philosophy, not semantic web) theory of information, of which there have been several versions, and here I follow the version of Fred Dretske in his book "Knowledge and the Flow of Information". Read it for a much better explanation and justification. The note contains information if it passes this test: (given you have a something (call it F) that you want to convey is) A signal r caries information that s is F if the conditional probability of s being F, given r and k (k representing the internal knowlege of the recipeint r about s), is 1 (but given k alone), is less than 1. Sounds complicated and perhaps foolish, but it's rather simple in practice. The note Norm has (with DanBri's name on it from r) delivers to TimBL the conditional probability that the employee selected at r was DanBri with a probability of 1. It conveys the note is *about* DanBri, not DanC or anyone else. So, if Norm had written another name after losing the first note, even if that name was DanBri, it would convey *no* information about s to r since while DanBri's name is written, that name on the newly forged note is not actually from s. This avoids a problem that Shannon has - you know that if I tell you I live on Forest Rd. in Edinburgh, Scotland I convey *more* information than if I just tell you I live in Edinburgh, Scotland. It's impossible to quantify how many more bits the first statement conveys than the second statement, but the first statement clearly conveys more information. It conveys more information because it conveys both propositions F' (living on Forst Rd) and F''(living in Edinburgh Scotland) with a probability of 1 - allowing no other alternatives. However, you'll note this definition of information is highly dependent on thing such as there being a clear situation s and r, as well as statements F and probability k (and some notion of information being true and corpuscular). These things are all very tricky in the real world, and perhaps even more so in the Web world. But the relevance is that a web-page about the Eiffel Tower that gives me a few paragraphs *about* the Eiffel tower has more information for me than one that just lists the Eiffel Tower in a list. So, a resource as denoted by a URI string, via the oft-cited principle of URI opacity, should convey no information about the represenations retrieved by the resource, while something that has a representation does convey usually information. In fact, if it doesn't (such as a string of random bits) then it probably isn't a representation in the common (not TAG!) usage of the word. I think, while the word "information" is up in the air a bit, it can be used sensibly. I'm not sure if "information resource" really is a better word than "Web resource", and I find "Web resource" more clear. If any information resource wasn't accessible via the Web, it couldn't convey information over the Web, and so shouldn't probably be part of the TAG. The question is then are there things on the Web that do not convey information to anyone about anything? Hmmm...requires more thought, but probably not? Any comments should probably be e-mailed to me directly unless you really think they are relevant for the whole www-tag group. However, I did think a few people here might be interested before we delete the word information from all appearances in all W3C documents. -harry On Fri, 24 Sep 2004, Bullard, Claude L (Len) wrote: > > From: Chris Lilley [mailto:chris@w3.org] > > BCLL> Try to conceive of a test for 'information space'. > > >Well, one can, with the drawback that everything tested passes. So its > >not usefully testable. > > Correct. And when a term does not discriminate, it conveys no > information (fully entropic). The term is a placeholder, the > outermost brace, or the set of sets. You could say it is > the *resource set*, a zero-based origin. Note I did not > say it is the *web resource set*. > > So "information space" is colloquially useful, but not computable > as an ontological member (if I use it as a theory, it does not > identify a resource itself). > > len > >
Received on Saturday, 25 September 2004 20:32:38 UTC