Sources for Encoding specification from Norbert Lindenberg on 2012-04-10 (public-i18n-core@w3.org from April to June 2012)

From: Norbert Lindenberg <w3@norbertlindenberg.com>
Date: Tue, 10 Apr 2012 11:52:20 -0700
To: Anne van Kesteren <annevk@opera.com>
Cc: Norbert Lindenberg <w3@norbertlindenberg.com>, public-i18n-core@w3.org
Message-Id: <FE3D0C19-D6AF-443F-9203-E857117BA9D3@norbertlindenberg.com>

Hi Anne,

I'm writing to you on behalf of the W3C Internationalization Core WG [1], which is currently looking at your Encoding specification [2].

We're wondering what sources you used to obtain the information in the specification, such as the list of encodings, their aliases, and the mapping tables for them. Is this derived from looking at the source code of one or more browsers, or from testing their behavior, or from a web index that shows which encodings are most commonly used on the web and how?

Some issues I'm wondering about, which could be resolved by looking at good data: Is EUC-TW not used on the web, or so rarely that it's not worth specifying? Do browsers really only encode the characters they decode; don't they ever try to map full-width ASCII to their plain ASCII equivalents, or use other fallbacks, when encoding? Do they really assume it's safe to encode all windows-1252 characters for a form in a page labeled iso-8859-1? Do UTF-8 decoders really still allow for 5-byte and 6-byte sequences?

Best regards,
Norbert

[1] http://www.w3.org/International/track/actions/111
[2] http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html

Received on Tuesday, 10 April 2012 18:52:53 UTC