- From: Pat Hayes <phayes@ihmc.us>
- Date: Sun, 13 Apr 2008 15:54:14 -0500
- To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
- Cc: "wangxiao@musc.edu" <wangxiao@musc.edu>, Tim Berners-Lee <timbl@w3.org>, Michaeljohn Clement <mj@mjclement.com>, "www-tag@w3.org WG" <www-tag@w3.org>
- Message-Id: <p06230907c4281fdabe53@[192.168.1.2]>
At 12:52 PM +0000 4/13/08, Booth, David (HP Software - Boston) wrote: > > From: Xiaoshu Wang >> >> Tim Berners-Lee wrote: >> [ . . . ] >> > The point is that when conneg is used to return two different >> > representations of a document, with different content types, the use >> > is ONLY to allow negotiation of different formats for the SAME >> > information. [ . . . ] >> >> Tim, it sure can work if you tell us (or me) exactly the >> meaning of the SAMEness. > >Okay, I'll take a crack at this. :) > >Imagine that you have some information, I, that you wish to provide >through your Web server. Either this is a category error, or I have no idea what you are talking about. What do you mean by 'information' here? Subsequently your message talks about coding functions which map information to byte sequences, presumably implemented by processes which actually take 'information' as input. In my understanding of what 'information' means, I have never seen any kind of architecture which can simply take information as input. Any input to a computational process has to be encoded in bits somehow, because bits are what computations work on. So these encodings are all functions from byte streams (documents, files, whatever) to other byte streams. So, now, to return to Xiaoshu's question to Tim: how do we decide when - what does it mean to say that - two byte streams contain the SAME information? > To simplify the discussion, let's assume that I can be >characterized as a set of individual pieces of information, so that >we can easily compare information content by comparing sets. >Further suppose there are some pairs of well known encoding >functions, E1...En and their corresponding decoding functions >D1...Dn. These encoding/decoding functions are generic -- they are >NOT specific to I -- and they correspond to the various combinations >of well known languages and media types. If we call the type of I >Information, then each Ei is a function from Information to a >ByteSequence: > > Ei: Information -> ByteSequence > >and each corresponding Di is a function the other way around: > > Di: ByteSequence -> Information > >Content negotiation is conceptually the process of selecting the >desired pair of encoding/decoding functions, identified by i. The >server chooses i (based on the client's language and media type >preferences) and sends ByteSequence Ei(I) to the client. The client >then interprets the received ByteSequence according to the >corresponding decoding function, Di, to obtain Information, R: > > R = Di(Ei(I)). > >Set R can be further partitioned into two subsets: RI, which is a >subset of I; and RA, which is any Information that is NOT a subset >of I. So: > > R = RI + RA > >In essence, RI is (a subset of) the information that the client >wanted. The more lossy the encoding/decoding, the smaller RI is a >subset of I. For an entirely lossless encoding/decoding, RI = I. >But what is RA? RA is information that is an artifact (or >by-product) of the encoding/decoding process itself. (For example, >if the information is encoded in HTML, it might include the number >of bytes in the HTML.) > >In no case did the encoding/decoding *add* any information *except* >information that was a by-product of the generic encoding/decoding >process itself. So for example, if I is a photographic image of a >cat, Ei(I) might be a JPEG encoding and RI would be the lossy subset >of I that is received after decoding. However, content negotion is >*only* for sending information that is in I. Content negotiation is >*not* for sending arbitrary information (or metadata) that is not >already in I, such as the fact that the photograph was taken by >"David Booth" and the cat depicted in the photograph is named >"Cheshire". BUt if this metadata were part of I, then this would be OK, right? So, here one is on the receiving end, and I get handed a lossy JPEG image and some RDF metadata about it. What should one do? Protest to the Gods of the Internet about this mis-use of conneg, or simply assume that I - to which one has no other access - did after all contain the metadata. The fact that one has not seen it before can be chalked up to the other kinds of lossyness inherent in the other encodings one had previously chosen. The point Im getting at is that this strict doctrine which we keep hearing about, regarding what conneg MUST be used for and MUST NOT be used for, seems to be nothing but doctrine. There is no way to determine that it is being obeyed. If we all just go on assuming that it is, then some resources will seem to have more information in them than they previously seemed to have. As long as this extra information is useful, why should anyone complain about this outcome? To hell with doctrine when it ceases to be useful and becomes simply an encumbrance. Pat >That information is not in I and the encoding/decoding functions are >generic -- they are *not* specific to I. (They cannot add >information that they do not have.) > >So, the exact meaning of SAMEness is that the information sent >consists *only* of information that is either a subset of I, or an >artifact of the generic encoding/decoding process itself. > > > > >David Booth, Ph.D. >HP Software >+1 617 629 8881 office | dbooth@hp.com >http://www.hp.com/go/software > >Opinions expressed herein are those of the author and do not >represent the official views of HP unless explicitly stated >otherwise. -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell http://www.ihmc.us/users/phayes phayesAT-SIGNihmc.us http://www.flickr.com/pathayes/collections
Received on Sunday, 13 April 2008 20:55:02 UTC