Re: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Pat Hayes on 2006-04-24 (public-swbp-wg@w3.org from April 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 24 Apr 2006 14:14:34 -0500
To: Frank Manola <fmanola@acm.org>
Cc: :
Message-Id: <p06230901c06f39f8e77a@[192.168.2.3]>
>>>
>>>>  From:  Pat Hayes
>>>>
>>>>  It might be best to start with a definition of what you consider an
>>>>  information resource to be. Since the TAG do not define this critical
>>>>  term, yet base important engineering decisions on it, any
>>>>  authoritative exposition would be of immense value. My current
>>>>  understanding is that an information resource is some thing that can
>>>>  be transmitted over a network by a transfer protocol. On this
>>>>  understanding, one could argue that a word was an information
>>>>  resource.
>>>
>>>Definitely not.  That would be a "representation", not an "information
>>>resource".  The information resource is the *source* of
>>>"representations" that can be transmitted over a network.
>
>Sorry to butt in, but a couple of minor comments:

Not minor at all, Frank. This might get to one of the hearts of the issue.

>"Definitely not" may be technically correct, but 
>I think a bit more context is needed here.  The 
>TAG Architecture document says:
>
>"It is conventional on the hypertext Web to 
>describe Web pages, images, product catalogs, 
>etc. as ³resources². The distinguishing 
>characteristic of these resources is that all of 
>their essential characteristics can be conveyed 
>in a message. We identify this set as 
>³information resources.²
>
>This document is an example of an information 
>resource. It consists of words and punctuation 
>symbols and graphics and other artifacts that 
>can be encoded, with varying degrees of 
>fidelity, into a sequence of bits. There is 
>nothing about the essential information content 
>of this document that cannot in principle be 
>transfered in a message. In the case of this 
>document, the message payload is the 
>representation of this document."

OK, reading the above carefully, in the light of 
David's comment, I seem to discern an implicit 
distinction between several things. Let me be 
excruciatingly pedantic here for a second, and 
make very, very careful distinctions between 
several things involved in a hypothetical HTTP 
GET, which to keep things as simple as possible I 
will assume is the successful getting of an XHTML 
web page from a server, with a 2xx code, no 
problems. There seem to be several entities 
involved in this.

1.  An "HTTP endpoint", which is a computational 
process running on hardware, which processes the 
GET request and emits http codes and bit-strings.

2. The sequence of bits or bytes whose 
transmission from (1) constitutes the successful 
completion of the GET request.

3. The Web page itself: a document, consisting of 
characters, which conform to XHTML syntactic 
rules.

4. The encoding of the Web page (3) which is used 
by the process (1) to produce the bitsequence (2)

5. The encoding of the Web page (3) which is 
produced from the bit sequence (2) in the browser 
which issued the GET request and used by it 
render a visual form of the Web page (3) on the 
users's screen.

and we could of course go further, distinguishing 
the image on the screen from its binary 
representation, the state of the process from the 
process itself, and so on (and on.)

Now, I tend to blur some of these distinctions, 
myself. For example, I tend to think of 2 through 
5 as simply being 'the Web page'; or if I am 
being more careful, to identify 2, 4 and 5 as 
'renderings' or 'encodings' or 'tokens' of the 
single, abstract, Web page (3). And I often don't 
bother to distinguish between 1 and 4. This gives 
a simplified picture, which is adequate for many 
purposes, in which we happily ignore the 
type/token distinction (as we normally do in 
English) and where issuing a GET is a bit like 
asking an usher for A concert program, at which 
she then hands you a copy from her pile of 
identical copies, and you take it away and read 
it without bothering her further, and if anyone 
asks you what you are reading you say, THE 
program. (You could say that each copy is a 
'representation' of the great concert program in 
the sky, or of all the other copies, or of the 
state of the printing platen at the moment the 
ink hit the paper, but there's not usually much 
point in being that picky about these 
distinctions.)

It seems (?) that David is concerned to maintain 
a clear distinction between 1 and 2, and wants to 
be clear that the information resource is the 
former. I am however not sure what the status of 
3 is, on this account. It hardly seems reasonable 
to say that 2 is a representation of 1 in the way 
that it might be to say that it is a 
'representation' (token) of 3.  Now, I guess 
there is a coherent position which considers 4 to 
be a part or an aspect of (a state of) 1, so 
views 2 as a representation of (a state of) 1, 
and considers 1, now considered be embodying or 
including 4, to be the actual information 
resource; that seems to be closest to what the 
REST model says, and it is what David seemed to 
be saying. But it does not seem to be what the 
TAG says when it declares that an information 
resource is a document, i.e. 3. If anything, this 
would have all of 2, 4 and 5 being 
'representations' of 3. Whatever 1 is, it 
certainly is not a document in the sense of (3). 
And, as you point out, other W3C sources speak of 
information resources as being transmitted over a 
network, which makes sense only for 2, speaking 
strictly. So, as so often in trying to understand 
the TAG, I am left in a state of muddled 
confusion as what everyone is talking about.

Just to clarify another source of muddle, I would 
not call any of these things "representations" of 
any of the others. In my usage of the word 
"representation", there is no representation of 
anything involved in the entire architectural 
story of how an http GET is processed. Nothing 
represents anything here, because there are no 
semantic relationships involved. The various 
bitstrings are simply copies of one another, and 
the relationship of a document to its bitstring 
encoding is that of a rendering or encoding, 
rather than a representation: a token/type 
relationship. (The bitstring does not *describe* 
the document it encodes. If it did, it would have 
to describe it using a syntax, but bitstrings, 
pretty much by their very nature, do not have any 
syntax.)

This usage of "representation", which I have 
slowly come to understand is common in the TAG 
documents, is entirely alien to uses of that word 
in logic, linguistics and semantics (and AI/KR), 
and which is used throughout the RDF and OWL 
specification documents. This is not the sense of 
"representation" in which, for example, an RDF 
ontology of weather might be said to represent 
the weather conditions in Oaxacala. On this TAG 
sense of "representation", one would presumably 
say that any written token of a word was a 
'representation' of the abstract word itself. 
This usage might be glossed as 
'represents-as-token', or maybe 
'represents-as-brass-rubbing' rather than 
'represents-by-description'. I also note in 
passing that with this notion of representation, 
it is (literally) impossible for any bit-string 
to 'represent' anything other than a document. In 
particular, it is impossible to 'represent' the 
weather over Oaxacala in this sense. Of course 
one can 'represent' a weather *report*; and that 
report might represent, in an entirely different 
sense, the real weather; but 
being-a-representation-of is not transitive.

>So, referring to the next sentence, it would 
>seem that an RDF ontology and an HTML web page 
>*are* information resources.  What gets 
>transmitted over the wire, however, would be 
>representations of those information resources. 
>Right?

An RDF ontology, at any rate, is either an RDF 
graph or an RDF/XML XML document. Either way, it 
is not an HTTP endpoint or an abstraction of an 
HTTP endpoint. So it cannot be an information 
resource in David's sense, seems to me.

Pat


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 24 April 2006 19:15:01 UTC