Re: Accept-Charset support

On Thu, 5 Dec 1996, Larry Masinter wrote:

> Oh, certainly not. I suppose we should not have avoided the political
> difficulties in the HTTP/1.1 spec, but clearly most of charset names
> are completely inappropriate. For a while, there was a list of
> 'charset' tokens in HTTP, but it seemed like it was a more general
> IANA/Charset issue than a HTTP one.

I am a little late with this answer, but here it is. Cleaning up
the hundreds of "charset"s and aliases that have been registered
is mainly a IANA issue. But there are HTTP and HTML issues.
Somebody earlier mentionned that for mail, iso-8859-2 was popular,
whereas for the web, CP1250 was most popular. This would have
to be reflected in the HTTP list. A similar thing exists for Japan,
where it is ISO-2022-JP for Mail, but also others for the web.


> HTTP/1.0 gave a list:
> 
>      charset = "US-ASCII"
>              | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
>              | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
>              | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
>              | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
>              | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
>              | token
> 
> and the appendix of HTTP/1.1 includes a list of 'preferred names':
> 
>        "US-ASCII"
>        | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
>        | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
>        | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
>        | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
>        | "SHIFT_JIS" | "EUC-KR" | "GB2312" | "BIG5" | "KOI8-R"
> 
>        "EUC-JP" for "EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE"
> 
> and I'm guessing the right place to fix this up for good is in the
> final edition of:
> 
> ftp://ftp.isi.edu/internet-drafts/draft-freed-charset-reg-01.txt

In addition, for ISO-8859-7 and -8, there is the problem of
bidirectionality. Given as such, these labels imply visual ordering,
which may be the best thing for line-oriented mail systems.
However, for HTML, a variant "charset" parameter (don't know
its value) has to be used, because HTML does line layout on
its own and has to get logical ordering and BIDI markup as input.

So this is not only a IANA issue. Probably it is not a good idea
to cast it "in stone" in HTTP 1.1, but if we can come up with
a reasonable set on this list, and can make it publicly available
somewhere (W3? the babel site?), it might help a lot.

Regards,	Martin.

Received on Wednesday, 11 December 1996 11:17:38 UTC