W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2007

[whatwg] 9.2.2: replacement characters. How many?

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 15 Jun 2007 00:25:05 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.0706142355370.30490@dhalsim.dreamhost.com>
On Fri, 3 Nov 2006, Elliotte Harold wrote:
>
> Section 9.2.2 of the current Web Apps 1.0 draft states:
> 
> Bytes or sequences of bytes in the original byte stream that could not 
> be converted to Unicode characters must be converted to U+FFFD 
> REPLACEMENT CHARACTER code points.
> 
> I'm concerned about the "or". For example, suppose there are six upper 
> halves of a Unicode surrogate pair in a row and no lower halves. Does 
> that turn into six replacement characters or one? Both interpretations 
> seem possible.
> 
> I suppose I prefer six rather than one, but I don't care a great deal as 
> long as this is locked down one way or the other.

I don't really know how to define this. I'd like to say that it's up to 
the encoding specifications to define it. Any suggestions?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 14 June 2007 17:25:05 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:56 UTC