[whatwg] 9.2.2: replacement characters. How many? from Ian Hickson on 2007-06-15 (public-whatwg-archive@w3.org from June 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 15 Jun 2007 00:25:05 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.0706142355370.30490@dhalsim.dreamhost.com>

On Fri, 3 Nov 2006, Elliotte Harold wrote:
>
> Section 9.2.2 of the current Web Apps 1.0 draft states:
> 
> Bytes or sequences of bytes in the original byte stream that could not 
> be converted to Unicode characters must be converted to U+FFFD 
> REPLACEMENT CHARACTER code points.
> 
> I'm concerned about the "or". For example, suppose there are six upper 
> halves of a Unicode surrogate pair in a row and no lower halves. Does 
> that turn into six replacement characters or one? Both interpretations 
> seem possible.
> 
> I suppose I prefer six rather than one, but I don't care a great deal as 
> long as this is locked down one way or the other.

I don't really know how to define this. I'd like to say that it's up to 
the encoding specifications to define it. Any suggestions?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 14 June 2007 17:25:05 UTC