- From: John Black <JohnBlack@kashori.com>
- Date: Mon, 7 Jan 2008 20:43:40 -0500
- To: "Pat Hayes" <phayes@ihmc.us>, <noah_mendelsohn@us.ibm.com>
- Cc: "Mikael Nilsson" <mikael@nilsson.name>, "Sean B. Palmer" <sean@miscoranda.com>, <www-tag@w3.org>
On 2007-12-18, Noah Mendelsohn writes: > > Pat Hayes writes: > >> No, an HTML *page* is a series of characters. > > Crucially, when transmitted as a representation using HTTP, it is not just > a series of characters. It is a series of octets >and< a Content-type. > That content type licenses the receiver to interpret the octets as much > more than characters, but as representing a tree of elements with some > semantics. So, if I ask, "Do HTML pages with the following bodies have > the same number of paragraphs?": > > Example 1: > <html><body><p>para</p><p>para</p></body></html> > > Example 2: > <html><body><P>para</P><P>para</P></body></html> > > The answer is "yes", even though the two documents are not comprised of > the same characters (not the uppercase "P"s in the 2nd). The pertinent > specifications, including the specification for the text/html media types > allow, us to determine that the bodies of each of these documents both > consist of two "paragraphs", as defined in HTML. The same paragraph > abstraction can be conveyed in at least two encodings: <P> and <p>. > > Shannon's theory starts with the assumption that a receiver knows that a > sender is in one of N possible states. Isn't this assumption false with respect to the web? In general, I have no idea what states the resources I visit may be in. Any given resource may be a novel, a song, a picture, an email message, a movie, a blog post, etc., etc. How could I possibly know the N possible states that 8 billion resources can be in? I think we must strike this assumption. If you agree, can you restate your argument without it? Does Shannon's theory apply when a receiver *does not* know what possible states the sender may be in? John Black www.kashori.com > Having received a correctly > encoded message, and agreeing with the sender on the encoding, the > receiver can determine which of the states that sender is in. That does > not mean the "state" has been transmitted in any other sense. > In the case of RDF and media type application/rdf+xml, we can assume that > the state of the sender is some RDF graph. The sender transmits a series > of characters, along withe Content-type indicator application/rdf+xml. > With this, a receiver can determine which RDF graph the sender had. In > that particular sense, the graph has been transmitted. > > The only difference in the case where the source resource happens to > itself to be a document that is known to be a sequence of characters, some > of the possible representation encodings are very straightforward. As far > as I'm aware, though, there is even in those cases no requirement that the > encoding be straighforward. If for some reason I invented a bizarre media > type that sent all the even numbered characters first, and then followed > with all the odd, I don't think that would break Web archtecture at all. > > So, I think that RDF graphs, numbers, and HTML document trees are all > information resources. The particular characters, such as whitespace, > comprising the representation of a resource on the wire are only > significant if the media type says they are. > > Noah > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > > >
Received on Tuesday, 8 January 2008 01:48:07 UTC