Re: [httpRange-14] What is an Information Resource?

>Pat Hayes writes:
>>  No, an HTML *page* is a series of characters.
>Crucially, when transmitted as a representation using HTTP, it is not just
>a series of characters.  It is a series of octets >and< a Content-type.

OK, sorry. I was aware when typing this that I was probably 
mis-speaking. But it is in the same category as a character sequence. 
It can be defined as an equivalence class of character sequences.

>That content type licenses the receiver to interpret the octets as much
>more than characters, but as representing a tree of elements with some

Right, but note, REPRESENTING the tree. Not actually BEING the tree.

>  So, if I ask, "Do HTML pages with the following bodies have
>the same number of paragraphs?":
>Example 1:
>Example 2:
>The answer is "yes", even though the two documents are not comprised of
>the same characters (not the uppercase "P"s in the 2nd).

Fine. (Though are they the *same* html document?) But I am making a 
slightly different, more elementary, point. Both of your examples are 
in fact sequences of characters. (There they are, in this very email 
message, which is itself a sequence of characters on the page of your 
screen.) They are of the type of things that can be lexically 
transmitted and displayed. They are NOT mathematical structures or 
abstractions. The primary example, which is worth cleaving to for its 
simplicity, is the distinction between a numeral and a number. One 
cannot display or transmit a number, only a numeral which... 
represents, denotes, encodes, choose your favorite word... the 
number. Numbers themselves are not physical or informational 
entities, and they don't have a grammar.

>  The pertinent
>specifications, including the specification for the text/html media types
>allow, us to determine that the bodies of each of these documents both
>consist of two "paragraphs", as defined in HTML.  The same paragraph
>abstraction can be conveyed in at least two encodings: <P> and <p>.
>Shannon's theory starts with the assumption that a receiver knows that a
>sender is in one of N possible states.   Having received a correctly
>encoded message, and agreeing with the sender on the encoding, the
>receiver can determine which of the states that sender is in.  That does
>not mean the "state" has been transmitted in any other sense.

OK. I know, a priori, that an actual RDF graph is not a possible 
state of a sender. Now apply Shannon's theory.

>In the case of RDF and media type application/rdf+xml, we can assume that
>the state of the sender is some RDF graph.

No, we cannot. A graph is a mathematical set. It is not the kind of 
thing that can send anything. Sets don't do things like that, in fact 
they don't do anything. The sender may be in some sense isomorphic to 
the graph, though this is unlikely, or it may contain in its state a 
datastructure or text which is a representation (not webarch:rep) of 
the graph. But it is not, itself, an RDF graph.

>   The sender transmits a series
>of characters, along withe Content-type indicator application/rdf+xml.
>With this, a receiver can determine which RDF graph the sender had

a representation of, a webarch:representation of which representation 
gets transmitted.

>.  In
>that particular sense, the graph has been transmitted.

No, it hasn't. Graphs aren't the kind of thing that can possibly be 

>The only difference in the case where the source resource happens to
>itself to be a document that is known to be a sequence of characters, some
>of the possible representation encodings are very straightforward.  As far
>as I'm aware, though, there is even in those cases no requirement that the
>encoding be straighforward.  If for some reason I invented a bizarre media
>type that sent all the even numbered characters first, and then followed
>with all the odd, I don't think that would break Web archtecture at all.

I agree. That was not my point.

>So, I think that RDF graphs, numbers, and HTML document trees are all
>information resources.

Well, maybe they are. I really don't care, frankly. But if they are 
information resources, then the concept of 'information resource' has 
become useless.


>  The particular characters, such as whitespace,
>comprising the representation of a resource on the wire are only
>significant if the media type says they are.
>Noah Mendelsohn
>IBM Corporation
>One Rogers Street
>Cambridge, MA 02142

IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell

Received on Tuesday, 18 December 2007 19:05:21 UTC