Re: Accept-Charset support
On Mon, 9 Dec 1996, Erik van der Poel wrote:
> > The choice between "UNICODE-1-1-UTF-8" and "UTF-8" has been debated at
> > length on the ISO10646 and Unicode lists, with the result that we have now:
> > "UTF-8". The wise implementer, however, would be well advised to support
> > the longer tag as an ad hoc alias.
> I'm not sure what you mean by "ad hoc alias", but the term "alias" is
> used in this context (Internet "charsets") to mean a synonym. Are
> "unicode-1-1-utf-8" and "utf-8" synonymous? If so, what is the name of
> UTF-8-encoded Unicode 2.0?
> Unicode 1.1 and 2.0 are not the same. In particular, there was a big
> change in the Korean block. The Korean characters in the U+3400 to
> U+3D2D range were removed, and they were added again with some others in
> the U+AC00 to U+D7A3 range. A future version of the Unicode standard may
> re-use the U+3400 to U+3D2D range. If/when that happens, what does
> "utf-8" mean?
In my oppinion, the fact that RFC 2044 refers to Unicode 1.1 is
an inconvenient historical coincidence. The RFC was submitted shortly
before Unicode 2.0 came out (which was expected for a long time).
I guess the general consensus is that UTF-8 should denote
Unicode 2.0 rather than Unicode 1.1 in cases where it really matters.
> Without rehashing the whole debate that you say already took place on
> those other mailing lists (which I didn't follow), could you briefly
> explain the future plans for the charset name "utf-8"? I glanced at RFC
> 2044 but didn't immediately see anything about this.
Here is what I remember from that discussion:
- Shortness to show it is important.
- No versioning to reduce the number of "charset" parameter values.
- No versioning because for most things (except Korean), it does
- No versioning to show that there is (basically) only one
character set (UCS) that is encoded.
- No versioning to create pressure to avoid further shuffling
of codepoints (a real stupidity).