- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 18 Apr 2012 09:09:30 +0200
- To: "Norbert Lindenberg" <w3@norbertlindenberg.com>
- Cc: public-i18n-core@w3.org
On Wed, 18 Apr 2012 08:15:17 +0200, Norbert Lindenberg <w3@norbertlindenberg.com> wrote: > A spec on encoding handling for the web should probably focus on those > encodings that are most commonly used on the web. Mark Davis sometimes > publishes data in that area; he may be able to provide more detail. > http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html > What browsers currently support may be influenced by which libraries > they use, and the libraries may have accumulated encodings that aren't > relevant to the web. Yeah, if we can do more research that would be great. I think most browsers indeed just use libraries, but Opera and Chrome are a bit more restrictive. I can't say much about Opera, but Chrome has a modified version of ICU with support for many encodings disabled, as well as a custom implementation for euc-jp and a few other tweaks. Gecko has similarly been making some changes to its encoding support over the years with respect to what extensions to implement of various encodings (and more recently disabled utf-7 and utf-32 support). >> Is there any utf-8 specification that says otherwise? You get U+FFFD, >> but the sequences are definitely supported. > > The UTF-8 specification (in the Unicode Standard, in ISO 10646, in RFC > 3629) was updated years ago to only allow sequences up to four bytes. > But I suppose it doesn't really matter whether a sequence of five or six > bytes is allowed and maps to U+FFFD because it's above U+10FFFF, or it's > treated as an error directly and replaced with U+FFFD... My apologies, for some reason I thought both Unicode and http://tools.ietf.org/html/rfc3629 still defined handling them as five- and six-byte sequences (even though they are invalid). As far as I know implementations have not changed with respect to this. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 18 April 2012 07:10:12 UTC