- From: <noah_mendelsohn@us.ibm.com>
- Date: Tue, 18 Dec 2007 12:50:34 -0500
- To: Pat Hayes <phayes@ihmc.us>
- Cc: Mikael Nilsson <mikael@nilsson.name>, "Sean B. Palmer" <sean@miscoranda.com>, www-tag@w3.org
Pat Hayes writes: > No, an HTML *page* is a series of characters. Crucially, when transmitted as a representation using HTTP, it is not just a series of characters. It is a series of octets >and< a Content-type. That content type licenses the receiver to interpret the octets as much more than characters, but as representing a tree of elements with some semantics. So, if I ask, "Do HTML pages with the following bodies have the same number of paragraphs?": Example 1: <html><body><p>para</p><p>para</p></body></html> Example 2: <html><body><P>para</P><P>para</P></body></html> The answer is "yes", even though the two documents are not comprised of the same characters (not the uppercase "P"s in the 2nd). The pertinent specifications, including the specification for the text/html media types allow, us to determine that the bodies of each of these documents both consist of two "paragraphs", as defined in HTML. The same paragraph abstraction can be conveyed in at least two encodings: <P> and <p>. Shannon's theory starts with the assumption that a receiver knows that a sender is in one of N possible states. Having received a correctly encoded message, and agreeing with the sender on the encoding, the receiver can determine which of the states that sender is in. That does not mean the "state" has been transmitted in any other sense. In the case of RDF and media type application/rdf+xml, we can assume that the state of the sender is some RDF graph. The sender transmits a series of characters, along withe Content-type indicator application/rdf+xml. With this, a receiver can determine which RDF graph the sender had. In that particular sense, the graph has been transmitted. The only difference in the case where the source resource happens to itself to be a document that is known to be a sequence of characters, some of the possible representation encodings are very straightforward. As far as I'm aware, though, there is even in those cases no requirement that the encoding be straighforward. If for some reason I invented a bizarre media type that sent all the even numbered characters first, and then followed with all the odd, I don't think that would break Web archtecture at all. So, I think that RDF graphs, numbers, and HTML document trees are all information resources. The particular characters, such as whitespace, comprising the representation of a resource on the wire are only significant if the media type says they are. Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Tuesday, 18 December 2007 17:50:19 UTC