- From: Tim Greenwood <greenwd@openmarket.com>
- Date: Wed, 24 Jan 1996 18:53:45 -0500
- To: Nickolay Saukh <nms@nns.ru>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
At 01:16 AM 1/25/96 +0300, you wrote: >> >Suppose I would like to get description of chess game in russian. >> >I know about special chess characters in Unicode. But if Unicode >> >is not available, then iso8859-5 would be sufficient. >> >> Section 12 of HTTP 1.1 (Nov 22) says "If multiple representations >> exist that only vary by Content-Encoding, then the smallest representation >> (lowest bs) is preferred." > >Well, they vary by special chess characters in Unicode and their >rough approximation in iso-8859-5. Content equivilance is both a hard philosophical and easy protocol issue. For simplicity consider text only content. We have Abstract idea transformed to Language transformed to Text transformed to Character set encoding Possibly transformed to Content coding Possibly transformed to Transfer coding At any point in this list we have the possibility of multiple representations of the higher entity. Transformations may be performed by the server on the fly in response to a request, or multiple transformation representations may be stored. My understanding of the protocol is that identical URL's denote identical abstract ideas - multiple representations may be stored, differentiated on request by HTML headers and on resoponse by entity headers. It is the content provider who is making the claim for identical content of Entity-Body at the abstract level. For your chess example if the content provider has decided that the "rough approximation in iso-8859-5" and the representation in Unicode are multiple representations of the same abstract idea, then we have content equivilance and the lowest bs rule for deciding which character set to provide holds. If the content provider decides that these are two separate abstract ideas then the two representations have different URL's and none of this applies. Language variants delimited by differing Content-Language entity headers are a more interesting example of multiple representation for abstract idea. See "Godel, Escher, Bach" for a discussion of equivilance of multiple language representations of abstract ideas. If the transformation is performed on the fly then equivilance is presumed from the transformation algorithims used. It is interesting that the algorithims to be used are not necessarily specified by the standard. For example ISO8859-5 to Unicode. The Unicode Consortium publishes a set of tables which I would recommend we consdier to be 'the standard' but other, disputed, conversion tables have been seen, especially for ideographic based writing systems. ------------------------------------- Tim Greenwood Open Market Inc 617 679 0320 greenwd@openmarket.com
Received on Wednesday, 24 January 1996 15:58:25 UTC