RE: Uniform access to descriptions

> From: Xiaoshu Wang
> Tim Berners-Lee wrote:
> [ . . . ]
> > The point is that when conneg is used to return two different
> > representations of a document, with different content types, the use
> > is ONLY to allow negotiation of different formats for the SAME
> > information. [ . . . ]
> Tim, it sure can work if you tell us (or me) exactly the
> meaning of the SAMEness.

Okay, I'll take a crack at this.  :)

Imagine that you have some information, I, that you wish to provide through your Web server.  To simplify the discussion, let's assume that I can be characterized as a set of individual pieces of information, so that we can easily compare information content by comparing sets.  Further suppose there are some pairs of well known encoding functions, E1...En and their corresponding decoding functions D1...Dn.  These encoding/decoding functions are generic -- they are NOT specific to I -- and they correspond to the various combinations of well known languages and media types.  If we call the type of I Information, then each Ei is a function from Information to a ByteSequence:

  Ei: Information -> ByteSequence

and each corresponding Di is a function the other way around:

  Di: ByteSequence -> Information

Content negotiation is conceptually the process of selecting the desired pair of encoding/decoding functions, identified by i.  The server chooses i (based on the client's language and media type preferences) and sends ByteSequence Ei(I) to the client.  The client then interprets the received ByteSequence according to the corresponding decoding function, Di, to obtain Information, R:

  R = Di(Ei(I)).

Set R can be further partitioned into two subsets: RI, which is a subset of I; and RA, which is any Information that is NOT a subset of I.  So:

  R = RI + RA

In essence, RI is (a subset of) the information that the client wanted.  The more lossy the encoding/decoding, the smaller RI is a subset of I.  For an entirely lossless encoding/decoding, RI = I.  But what is RA?  RA is information that is an artifact (or by-product) of the encoding/decoding process itself.  (For example, if the information is encoded in HTML, it might include the number of bytes in the HTML.)

In no case did the encoding/decoding *add* any information *except* information that was a by-product of the generic encoding/decoding process itself.  So for example, if I is a photographic image of a cat, Ei(I) might be a JPEG encoding and RI would be the lossy subset of I that is received after decoding.  However, content negotion is *only* for sending information that is in I.  Content negotiation is *not* for sending arbitrary information (or metadata) that is not already in I, such as the fact that the photograph was taken by "David Booth" and the cat depicted in the photograph is named "Cheshire".  That information is not in I and the encoding/decoding functions are generic -- they are *not* specific to I.  (They cannot add information that they do not have.)

So, the exact meaning of SAMEness is that the information sent consists *only* of information that is either a subset of I, or an artifact of the generic encoding/decoding process itself.

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |

Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.

Received on Sunday, 13 April 2008 12:54:07 UTC