Re: [httpRange-14] What is an Information Resource? from noah_mendelsohn@us.ibm.com on 2007-12-18 (www-tag@w3.org from December 2007)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 18 Dec 2007 12:50:34 -0500
To: Pat Hayes <phayes@ihmc.us>
Cc: Mikael Nilsson <mikael@nilsson.name>, "Sean B. Palmer" <sean@miscoranda.com>, www-tag@w3.org
Message-ID: <OF1F5783BD.69209AAE-ON852573B5.0060BF36-852573B5.0061E1E3@lotus.com>

Pat Hayes writes:

> No, an HTML *page* is a series of characters. 

Crucially, when transmitted as a representation using HTTP, it is not just 
a series of characters.  It is a series of octets >and< a Content-type. 
That content type licenses the receiver to interpret the octets as much 
more than characters, but as representing a tree of elements with some 
semantics.  So, if I ask, "Do HTML pages with the following bodies have 
the same number of paragraphs?":

Example 1:
<html><body><p>para</p><p>para</p></body></html>

Example 2:
<html><body><P>para</P><P>para</P></body></html>

The answer is "yes", even though the two documents are not comprised of 
the same characters (not the uppercase "P"s in the 2nd).  The pertinent 
specifications, including the specification for the text/html media types 
allow, us to determine that the bodies of each of these documents both 
consist of two "paragraphs", as defined in HTML.  The same paragraph 
abstraction can be conveyed in at least two encodings: <P> and <p>.

Shannon's theory starts with the assumption that a receiver knows that a 
sender is in one of N possible states.   Having received a correctly 
encoded message, and agreeing with the sender on the encoding, the 
receiver can determine which of the states that sender is in.  That does 
not mean the "state" has been transmitted in any other sense.

In the case of RDF and media type application/rdf+xml, we can assume that 
the state of the sender is some RDF graph.  The sender transmits a series 
of characters, along withe Content-type indicator application/rdf+xml. 
With this, a receiver can determine which RDF graph the sender had.  In 
that particular sense, the graph has been transmitted.

The only difference in the case where the source resource happens to 
itself to be a document that is known to be a sequence of characters, some 
of the possible representation encodings are very straightforward.  As far 
as I'm aware, though, there is even in those cases no requirement that the 
encoding be straighforward.  If for some reason I invented a bizarre media 
type that sent all the even numbered characters first, and then followed 
with all the odd, I don't think that would break Web archtecture at all. 

So, I think that RDF graphs, numbers, and HTML document trees are all 
information resources.  The particular characters, such as whitespace, 
comprising the representation of a resource on the wire are only 
significant if the media type says they are.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Tuesday, 18 December 2007 17:50:19 UTC