Re: FW: question on "charset"s from Jim Davis on 1998-05-20 (w3c-dist-auth@w3.org from April to June 1998)

From: Jim Davis <jdavis@parc.xerox.com>
Date: Wed, 20 May 1998 08:46:01 PDT
To: WEBDAV WG <w3c-dist-auth@w3.org>
Cc: spreitzer@parc.xerox.com
Message-Id: <3.0.3.32.19980520084601.00816420@mailback.parc.xerox.com>

At 09:27 PM 5/19/98 PDT, Mike_Spreitzer.PARC@xerox.com wrote
>In IETF jargon, a "charset" encompases: (1) an abstract set of characters,
(2) an association between numeric codes and abstract characters, and (3) a
way to encode those numeric codes in byte sequences.  "Unicode" is about
(1) and (2), and XML recognizes the need for (3) as well.  My question is
this: does a WebDAV server preserve (1), (2), and (3), or just (1) and (2)?
 That is, if I write entity bodies or properties or other metadata in one
particular encoding, are read operations required to return that content in
the same encoding?

It seems to me that WebDAV is silent on this question - it just does
whatever HTTP 1.1 does.

HTTP mandates support for UTF-8, which defines the encoding (your point
number 3).  I don't see (but may have missed) any language that prohibits
other encodings.

By "same encoding" did you mean same algorithm (e.g. always UTF-8) or
byte-for-byte identical when using the same algorithm?  

If the latter, section 3.4 of RFC 2068 says "a character set may provide
more than one sequence of octets to represent a particular character",
which to me implies that HTTP servers in general, and WebDAV serves in
particular, need not guarantee that the octet sequence returned by a GET is
byte for byte identical with the one created with a PUT.  I don't even see
a requirement there that the sequence returned be the 'canonical' one, if
there is a choice.

That's my opinion, but I don't consider myself expert.

Jim

Received on Wednesday, 20 May 1998 12:58:29 UTC