- From: Chris Lilley <Chris.Lilley@sophia.inria.fr>
- Date: Fri, 6 Dec 1996 11:18:48 +0100 (MET)
- To: Larry Masinter <masinter@parc.xerox.com>, christw@microsoft.com, www-international@w3.org
- Cc: garym@softshore.com.au, Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com
On Dec 5, 9:59pm, Larry Masinter wrote: > HTTP/1.0 gave a list: > > charset = "US-ASCII" > | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3" > | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6" > | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9" > | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR" > | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8" > | token well, token covers all the rest ;-) Incidentally bash$ grep UNICODE-1-1-UTF-8 character-sets.txt bash$ UNICODE-1-1-UTF-8 does not appear to be registered; although RFC 1641 postulates it as a theoretical entity, RFC 2044 (not yet diffused to all mirrors) specified UTF-8. > and the appendix of HTTP/1.1 includes a list of 'preferred names': > > "US-ASCII" > | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3" > | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6" > | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9" > | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR" > | "SHIFT_JIS" | "EUC-KR" | "GB2312" | "BIG5" | "KOI8-R" > > "EUC-JP" for "EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE" Did anyone verify that all the 8859 charsets are used or useful? I notice that Unicode-1-1 (ie UCS-2) and UTF-8 are missinng from the HTTP/1.1 list, is there a reason for this? Is there a registration request being processed for the EUC-JP alias or is that yet to be done? > and I'm guessing the right place to fix this up for good is in the > final edition of: > > ftp://ftp.isi.edu/internet-drafts/draft-freed-charset-reg-01.txt Thanks for the reference. I see that it only allows character sets owned by national bodies to be registered from now on. This may be a good idea or it may not (I recall that the early drafts of 10646 were essentially all the national standard character sets catted together with scant reference to actual practice). This is interesting: | A character set should therefore be registered ONLY if it adds | significant functionality that is valuable to a large | community, OR if it documents existing practice in a large | community. Note that character sets registered for the second | reason should be explicitly marked as being of limited or | specialized use and should only be used in Internet messages | with prior bilateral agreement. I suppose content negotiation counts as bilateral agreement, so this could be taken to imply that level 3 charsets should only be sent if explicitly requested in the Accept-Charset header (otherwise it would be a unilateral agreement). -- Chris Lilley, W3C [ http://www.w3.org/ ] Graphics and Fonts Guy The World Wide Web Consortium http://www.w3.org/people/chris/ INRIA, Projet W3C chris@w3.org 2004 Rt des Lucioles / BP 93 +33 (0)4 93 65 79 87 06902 Sophia Antipolis Cedex, France
Received on Friday, 6 December 1996 05:18:20 UTC