[Bug 23646] "us-ascii" should not be an alias for "windows-1252"

https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646

--- Comment #45 from Henri Sivonen <hsivonen@hsivonen.fi> ---
(In reply to Jirka Kosek from comment #41)
> (In reply to Henri Sivonen from comment #40)
> > As for the TextEncoding API, it doesn't support non-UTF-* encodings anyway,
> > so the issue of "us-ascii" is moot. 
> 
> Fortunately, it doesn't. But there is still generic definition of encoder
> which allows any encoding. And there is nothing in the Encoding Standard
> which prevents creation of another APIs which will allow more encodings and
> sooner and later will cause interop problems with encoding libraries that
> strictly follow IANA defintions of characters available in each encoding.

An encoding API that exposes all the Encoding Standard encodings should
strongly encourage users to 
 1) Always use UTF-8 anyway
and
 2) When rule #1 violated, to label the data using the canonical name of the
encoding from the spec.

For example, the API should refuse to give you an encoder for "us-ascii" to
force you to resolve the label "us-ascii" to windows-1252 first and then
request and encoder for that. Once you've had to resolve the label to the
encoding anyway, you should then use the canonical name of the encoding when
labeling the outgoing data.

> > I think we should focus the spec on the Web Platform--i.e. browsers. As
> > other systems find the need to consume Web content, they'll eventually grow
> > Encoding Standard-compliant encoding subsystems.
> > 
> > It's clear that there exist encoding libraries whose label handling is
> > IANA-oriented. Those will probably stick around for a long time for
> > compatibility with their old selves. It's unfortunate that the Web behavior
> > and e.g. the IANA-oriented JDK behavior differ, but we should just admit the
> > existence of two different legacies and not try to mix e.g. the JDK legacy
> > into Web specs.
> 
> OK, so what about if the scope of the Encoding Standard will incorporate
> what is in two paragraphs above and also it would state that encoding and
> decoding algorithms are defined in a way that it's compatible with the
> existing usage on the web and that any APIs build on the top of the Encoding
> Standard should support only utf-* encodings, otherwise interop with other
> encoding libraries is not guaranteed?

Well, if you follow my advice above, labeling the data as windows-1252 rather
than us-ascii increases interop with non-Encoding Standard receivers quite a
bit. This pattern applies to all the single-byte encodings in the Encoding
Standard as well as UTF-*, GB18030 and, AFAICT, EUC-JP: Use the canonical name
for labeling and you get interop. 

Unfortunately, it might not quite work for Shift_JIS, EUC-KR and Big5. I'm not
sure.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Friday, 4 July 2014 10:31:39 UTC