- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 18 Dec 2007 11:05:03 -0800
- To: noah_mendelsohn@us.ibm.com
- Cc: Mikael Nilsson <mikael@nilsson.name>, "Sean B. Palmer" <sean@miscoranda.com>, www-tag@w3.org
>Pat Hayes writes: > >> No, an HTML *page* is a series of characters. > >Crucially, when transmitted as a representation using HTTP, it is not just >a series of characters. It is a series of octets >and< a Content-type. OK, sorry. I was aware when typing this that I was probably mis-speaking. But it is in the same category as a character sequence. It can be defined as an equivalence class of character sequences. >That content type licenses the receiver to interpret the octets as much >more than characters, but as representing a tree of elements with some >semantics. Right, but note, REPRESENTING the tree. Not actually BEING the tree. > So, if I ask, "Do HTML pages with the following bodies have >the same number of paragraphs?": > >Example 1: ><html><body><p>para</p><p>para</p></body></html> > >Example 2: ><html><body><P>para</P><P>para</P></body></html> > >The answer is "yes", even though the two documents are not comprised of >the same characters (not the uppercase "P"s in the 2nd). Fine. (Though are they the *same* html document?) But I am making a slightly different, more elementary, point. Both of your examples are in fact sequences of characters. (There they are, in this very email message, which is itself a sequence of characters on the page of your screen.) They are of the type of things that can be lexically transmitted and displayed. They are NOT mathematical structures or abstractions. The primary example, which is worth cleaving to for its simplicity, is the distinction between a numeral and a number. One cannot display or transmit a number, only a numeral which... represents, denotes, encodes, choose your favorite word... the number. Numbers themselves are not physical or informational entities, and they don't have a grammar. > The pertinent >specifications, including the specification for the text/html media types >allow, us to determine that the bodies of each of these documents both >consist of two "paragraphs", as defined in HTML. The same paragraph >abstraction can be conveyed in at least two encodings: <P> and <p>. > >Shannon's theory starts with the assumption that a receiver knows that a >sender is in one of N possible states. Having received a correctly >encoded message, and agreeing with the sender on the encoding, the >receiver can determine which of the states that sender is in. That does >not mean the "state" has been transmitted in any other sense. OK. I know, a priori, that an actual RDF graph is not a possible state of a sender. Now apply Shannon's theory. >In the case of RDF and media type application/rdf+xml, we can assume that >the state of the sender is some RDF graph. No, we cannot. A graph is a mathematical set. It is not the kind of thing that can send anything. Sets don't do things like that, in fact they don't do anything. The sender may be in some sense isomorphic to the graph, though this is unlikely, or it may contain in its state a datastructure or text which is a representation (not webarch:rep) of the graph. But it is not, itself, an RDF graph. > The sender transmits a series >of characters, along withe Content-type indicator application/rdf+xml. >With this, a receiver can determine which RDF graph the sender had a representation of, a webarch:representation of which representation gets transmitted. >. In >that particular sense, the graph has been transmitted. No, it hasn't. Graphs aren't the kind of thing that can possibly be transmitted. > >The only difference in the case where the source resource happens to >itself to be a document that is known to be a sequence of characters, some >of the possible representation encodings are very straightforward. As far >as I'm aware, though, there is even in those cases no requirement that the >encoding be straighforward. If for some reason I invented a bizarre media >type that sent all the even numbered characters first, and then followed >with all the odd, I don't think that would break Web archtecture at all. I agree. That was not my point. >So, I think that RDF graphs, numbers, and HTML document trees are all >information resources. Well, maybe they are. I really don't care, frankly. But if they are information resources, then the concept of 'information resource' has become useless. Pat > The particular characters, such as whitespace, >comprising the representation of a resource on the wire are only >significant if the media type says they are. > >Noah > >-------------------------------------- >Noah Mendelsohn >IBM Corporation >One Rogers Street >Cambridge, MA 02142 >1-617-693-4036 >-------------------------------------- -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 18 December 2007 19:05:21 UTC