- From: Anne van Kesteren <annevk@opera.com>
- Date: Thu, 19 Apr 2012 09:12:39 +0200
- To: "Norbert Lindenberg" <w3@norbertlindenberg.com>
- Cc: public-i18n-core@w3.org
On Wed, 18 Apr 2012 09:09:30 +0200, Anne van Kesteren <annevk@opera.com> wrote: > On Wed, 18 Apr 2012 08:15:17 +0200, Norbert Lindenberg > <w3@norbertlindenberg.com> wrote: >> The UTF-8 specification (in the Unicode Standard, in ISO 10646, in RFC >> 3629) was updated years ago to only allow sequences up to four bytes. >> But I suppose it doesn't really matter whether a sequence of five or >> six bytes is allowed and maps to U+FFFD because it's above U+10FFFF, or >> it's treated as an error directly and replaced with U+FFFD... > > My apologies, for some reason I thought both Unicode and > http://tools.ietf.org/html/rfc3629 still defined handling them as five- > and six-byte sequences (even though they are invalid). As far as I know > implementations have not changed with respect to this. Turns out I was wrong. I went to double check after someone on ietf-charsets mentioned it as well and it turns out IE/Safari/Chrome handle this per Unicode. I fixed the spec: http://dvcs.w3.org/hg/encoding/rev/f2f234e98474 I filed a bug on Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=746900 And I filed a bug on Opera too. CORE-45840 if you have access to our system. Thank you for pointing this out. -- Anne van Kesteren http://annevankesteren.nl/
Received on Thursday, 19 April 2012 07:13:21 UTC