Re: Sources for Encoding specification

On Wed, 18 Apr 2012 09:09:30 +0200, Anne van Kesteren <annevk@opera.com>  
wrote:
> On Wed, 18 Apr 2012 08:15:17 +0200, Norbert Lindenberg  
> <w3@norbertlindenberg.com> wrote:
>> The UTF-8 specification (in the Unicode Standard, in ISO 10646, in RFC  
>> 3629) was updated years ago to only allow sequences up to four bytes.  
>> But I suppose it doesn't really matter whether a sequence of five or  
>> six bytes is allowed and maps to U+FFFD because it's above U+10FFFF, or  
>> it's treated as an error directly and replaced with U+FFFD...
>
> My apologies, for some reason I thought both Unicode and  
> http://tools.ietf.org/html/rfc3629 still defined handling them as five-  
> and six-byte sequences (even though they are invalid). As far as I know  
> implementations have not changed with respect to this.

Turns out I was wrong. I went to double check after someone on  
ietf-charsets mentioned it as well and it turns out IE/Safari/Chrome  
handle this per Unicode.

I fixed the spec: http://dvcs.w3.org/hg/encoding/rev/f2f234e98474

I filed a bug on Firefox:  
https://bugzilla.mozilla.org/show_bug.cgi?id=746900

And I filed a bug on Opera too. CORE-45840 if you have access to our  
system.

Thank you for pointing this out.


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Thursday, 19 April 2012 07:13:21 UTC