Re: Accept-Charset support

On Dec 5,  9:59pm, Larry Masinter wrote:

> HTTP/1.0 gave a list:
>      charset = "US-ASCII"
>              | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
>              | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
>              | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
>              | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
>              | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
>              | token

well, token covers all the rest ;-)


bash$ grep UNICODE-1-1-UTF-8 character-sets.txt

UNICODE-1-1-UTF-8 does not appear to be registered; although RFC 1641
postulates it as a theoretical entity,  RFC 2044 (not yet diffused to all
mirrors) specified UTF-8.

> and the appendix of HTTP/1.1 includes a list of 'preferred names':
>        "US-ASCII"
>        | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
>        | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
>        | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
>        | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
>        | "SHIFT_JIS" | "EUC-KR" | "GB2312" | "BIG5" | "KOI8-R"

Did anyone verify that all the 8859 charsets are used or useful?

I notice that Unicode-1-1 (ie UCS-2) and UTF-8 are missinng from the
HTTP/1.1 list, is there a reason for this?

Is there a registration request being processed for the EUC-JP alias
or is that yet to be done?

> and I'm guessing the right place to fix this up for good is in the
> final edition of:
> ftp://ftp.isi.edu/internet-drafts/draft-freed-charset-reg-01.txt

Thanks for the reference. I see that it only allows character sets
owned by national bodies to be registered from now on. This may be a
good idea or it may not (I recall that the early drafts of 10646 were
essentially all the national standard character sets catted together
with scant reference to actual practice).

This is interesting:

 | A character set should therefore be registered ONLY if it adds
 | significant functionality that is valuable to a large
 | community, OR if it documents existing practice in a large
 | community. Note that character sets registered for the second
 | reason should be explicitly marked as being of limited or
 | specialized use and should only be used in Internet messages
 | with prior bilateral agreement.

I suppose content negotiation counts as bilateral agreement, so this
could be taken to imply that level 3 charsets should only be sent if
explicitly requested in the Accept-Charset header (otherwise it would
be a unilateral agreement).

Chris Lilley, W3C                          [ http://www.w3.org/ ]
Graphics and Fonts Guy            The World Wide Web Consortium
http://www.w3.org/people/chris/              INRIA,  Projet W3C
chris@w3.org                       2004 Rt des Lucioles / BP 93
+33 (0)4 93 65 79 87       06902 Sophia Antipolis Cedex, France

Follow-Ups: References: