RE: Uniform access to descriptions from Pat Hayes on 2008-04-13 (www-tag@w3.org from April 2008)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 13 Apr 2008 15:54:14 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: "wangxiao@musc.edu" <wangxiao@musc.edu>, Tim Berners-Lee <timbl@w3.org>, Michaeljohn Clement <mj@mjclement.com>, "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <p06230907c4281fdabe53@[192.168.1.2]>
At 12:52 PM +0000 4/13/08, Booth, David (HP Software - Boston) wrote:
>  > From: Xiaoshu Wang
>>
>>  Tim Berners-Lee wrote:
>>  [ . . . ]
>>  > The point is that when conneg is used to return two different
>>  > representations of a document, with different content types, the use
>>  > is ONLY to allow negotiation of different formats for the SAME
>>  > information. [ . . . ]
>>
>>  Tim, it sure can work if you tell us (or me) exactly the
>>  meaning of the SAMEness.
>
>Okay, I'll take a crack at this.  :)
>
>Imagine that you have some information, I, that you wish to provide 
>through your Web server.

Either this is a category error, or I have no idea what you are 
talking about. What do you mean by 'information' here? Subsequently 
your message talks about coding functions which map information to 
byte sequences, presumably implemented by processes which actually 
take 'information' as input. In my understanding of what 
'information' means, I have never seen any kind of architecture which 
can simply take information as input. Any input to a computational 
process has to be encoded in bits somehow, because bits are what 
computations work on. So these encodings are all functions from byte 
streams (documents, files, whatever) to other byte streams. So, now, 
to return to Xiaoshu's question to Tim: how do we decide when - what 
does it mean to say that - two byte streams contain the SAME 
information?

>  To simplify the discussion, let's assume that I can be 
>characterized as a set of individual pieces of information, so that 
>we can easily compare information content by comparing sets. 
>Further suppose there are some pairs of well known encoding 
>functions, E1...En and their corresponding decoding functions 
>D1...Dn.  These encoding/decoding functions are generic -- they are 
>NOT specific to I -- and they correspond to the various combinations 
>of well known languages and media types.  If we call the type of I 
>Information, then each Ei is a function from Information to a 
>ByteSequence:
>
>   Ei: Information -> ByteSequence
>
>and each corresponding Di is a function the other way around:
>
>   Di: ByteSequence -> Information
>
>Content negotiation is conceptually the process of selecting the 
>desired pair of encoding/decoding functions, identified by i.  The 
>server chooses i (based on the client's language and media type 
>preferences) and sends ByteSequence Ei(I) to the client.  The client 
>then interprets the received ByteSequence according to the 
>corresponding decoding function, Di, to obtain Information, R:
>
>   R = Di(Ei(I)).
>
>Set R can be further partitioned into two subsets: RI, which is a 
>subset of I; and RA, which is any Information that is NOT a subset 
>of I.  So:
>
>   R = RI + RA
>
>In essence, RI is (a subset of) the information that the client 
>wanted.  The more lossy the encoding/decoding, the smaller RI is a 
>subset of I.  For an entirely lossless encoding/decoding, RI = I. 
>But what is RA?  RA is information that is an artifact (or 
>by-product) of the encoding/decoding process itself.  (For example, 
>if the information is encoded in HTML, it might include the number 
>of bytes in the HTML.)
>
>In no case did the encoding/decoding *add* any information *except* 
>information that was a by-product of the generic encoding/decoding 
>process itself.  So for example, if I is a photographic image of a 
>cat, Ei(I) might be a JPEG encoding and RI would be the lossy subset 
>of I that is received after decoding.  However, content negotion is 
>*only* for sending information that is in I.  Content negotiation is 
>*not* for sending arbitrary information (or metadata) that is not 
>already in I, such as the fact that the photograph was taken by 
>"David Booth" and the cat depicted in the photograph is named 
>"Cheshire".

BUt if this metadata were part of I, then this would be OK, right? 
So, here one is on the receiving end, and I get handed a lossy JPEG 
image and some RDF metadata about it. What should one do? Protest to 
the Gods of the Internet about this mis-use of conneg, or simply 
assume that I - to which one has no other access - did after all 
contain the metadata. The fact that one has not seen it before can be 
chalked up to the other kinds of lossyness inherent in the other 
encodings one had previously chosen.

The point Im getting at is that this strict doctrine which we keep 
hearing about, regarding what conneg MUST be used for and MUST NOT be 
used for, seems to be nothing but doctrine. There is no way to 
determine that it is being obeyed. If we all just go on assuming that 
it is, then some resources will seem to have more information in them 
than they previously seemed to have. As long as this extra 
information is useful, why should anyone complain about this outcome? 
To hell with doctrine when it ceases to be useful and becomes simply 
an encumbrance.

Pat

>That information is not in I and the encoding/decoding functions are 
>generic -- they are *not* specific to I.  (They cannot add 
>information that they do not have.)
>
>So, the exact meaning of SAMEness is that the information sent 
>consists *only* of information that is either a subset of I, or an 
>artifact of the generic encoding/decoding process itself.
>
>
>
>
>David Booth, Ph.D.
>HP Software
>+1 617 629 8881 office  |  dbooth@hp.com
>http://www.hp.com/go/software
>
>Opinions expressed herein are those of the author and do not 
>represent the official views of HP unless explicitly stated 
>otherwise.


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
http://www.ihmc.us/users/phayes      phayesAT-SIGNihmc.us
http://www.flickr.com/pathayes/collections
Received on Sunday, 13 April 2008 20:55:02 UTC