[Prev][Next][Index][Thread]

Re: Accept-Charset support



> >Are "unicode-1-1-utf-8" and "utf-8" synonymous?
> 
> For practical purposes, yes, unless one has to deal with data containing
> Korean Hangul coded according to Unicode 1.1.  There doesn't seem to be any
> such data extent on the Internet, which is why I did not bother to register
> "UNICODE-1-1-UTF-8".

RFC 2044 explicitly refers to Unicode 1.1. Martin seems to think that
the "consensus" is that "utf-8" refers to Unicode 2.0. However, any
perceived consensus is useless unless it's documented. The current
situation is confusing, and invites mistakes.

How did we end up making more mistakes? First, Unicode and ISO made the
mistake of making an incompatible change to Unicode. Now, we made
matters worse by confusing 1.1 and 2.0! How unfortunate.

Also, there may not be much extent Unicode 1.1 data on the net, but
there are installed copies of software that assume 1.1. E.g. Netscape
Navigator 3.0, which uses Unicode 1.1 conversions for KS C 5601 in Java.


> Consider what happens soon after Unicode 3.0 release, assuming that the UTC
> and ISO/IEC JTC1/SC2/WG3 stick to their pledges of no further incompatible
> changes.

Did they also pledge to refrain from re-using the codepoints U+3400 to
U+3D2D in the future?


> If the new server labels the content "UNICODE-3-0-UTF-8", the old client
> fails to recognize that and refuses to process/display: total loss of
> functionality.

David Goldsmith's spec allowed for a very regular naming convention,
which could be recognized/parsed. The client implementation could
recognize UNICODE-3-0-UTF-8 as being a new version of Unicode.


> Hence you have a better transition and better interoperability with a
> non-version-specific label, assuming no incompatible changes.  The
> registration of "UTF-8" is a bet that the relevant committees will stick to
> their word.

Hey, you're betting using *my* money! :-) Just kidding.


Erik


References: