Re: Sources for Encoding specification

On Wed, 18 Apr 2012 09:09:30 +0200, Anne van Kesteren <>  
> On Wed, 18 Apr 2012 08:15:17 +0200, Norbert Lindenberg  
> <> wrote:
>> The UTF-8 specification (in the Unicode Standard, in ISO 10646, in RFC  
>> 3629) was updated years ago to only allow sequences up to four bytes.  
>> But I suppose it doesn't really matter whether a sequence of five or  
>> six bytes is allowed and maps to U+FFFD because it's above U+10FFFF, or  
>> it's treated as an error directly and replaced with U+FFFD...
> My apologies, for some reason I thought both Unicode and  
> still defined handling them as five-  
> and six-byte sequences (even though they are invalid). As far as I know  
> implementations have not changed with respect to this.

Turns out I was wrong. I went to double check after someone on  
ietf-charsets mentioned it as well and it turns out IE/Safari/Chrome  
handle this per Unicode.

I fixed the spec:

I filed a bug on Firefox:

And I filed a bug on Opera too. CORE-45840 if you have access to our  

Thank you for pointing this out.

Anne van Kesteren

Received on Thursday, 19 April 2012 07:13:21 UTC