W3C home > Mailing lists > Public > www-international@w3.org > April to June 2014

[Bug 23646] "us-ascii" should not be an alias for "windows-1252"

From: <bugzilla@jessica.w3.org>
Date: Sat, 28 Jun 2014 16:05:50 +0000
To: www-international@w3.org
Message-ID: <bug-23646-4285-M0Ui0aemhv@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646

--- Comment #20 from Paul Eggert <eggert@cs.ucla.edu> ---
As the original reporter, I'd like to mention the use case that prompted the
bug report. I maintain the web page for the IANA time zone database, which gets
patches by correspondents all over the world, many of whom are not experts in
encodings. For decades the database has had a policy of using only ASCII, to
avoid interoperability problems. This includes its web pages. We don't want
these web pages to contain any non-ASCII characters; if they do, it's an error,
and the browser should display a botch.

We are now slowly migrating to UTF-8, having (thankfully) bypassed the Latin-1
disasters entirely. At some point I expect we'll even allow UTF-8 in our web
pages (but not yet). In the meantime, we don't want to give anybody the
mistaken impression that the database will use windows-1252 or Latin-1 or any
other unibyte encoding, because for us these encodings would be a disaster. And
yet that's what our users' browsers tell them.

I can understand the use case that prompted some web developers to say "hey,
just treat US-ASCII as Windows-1252". That may have made sense back in 1995
when unibyte encodings were still the typical use on the Web. But it doesn't
make sense any more, and this discussion is a symptom of it.

Because of this problem, I have given up on charset="US-ASCII" and have
switched our web pages to charset="UTF-8" even though they are strictly ASCII
and any non-ASCII characters in them are an error. I suggest adding commentary
to the standard that suggests to developers what to do in my situation, since
evidently charset="US-ASCII" is not the right thing to do, and certainly
charset="windows-1252" is not right either. If there's nothing developers can
do and the standard does not support this use case, then the commentary should
say so.

With this suggestion in mind, I also suggest that we change the status of this
report back to REOPENED.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Saturday, 28 June 2014 16:05:51 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:05 UTC