[whatwg] 9.2.2: replacement characters. How many? from Elliotte Harold on 2006-11-03 (public-whatwg-archive@w3.org from November 2006)

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Fri, 03 Nov 2006 06:52:17 -0500
Message-ID: <454B2D71.1000209@metalab.unc.edu>

Section 9.2.2 of the current Web Apps 1.0 draft states:

Bytes or sequences of bytes in the original byte stream that could not 
be converted to Unicode characters must be converted to U+FFFD 
REPLACEMENT CHARACTER code points.


I'm concerned about the "or". For example, suppose there are six upper 
halves of a Unicode surrogate pair in a row and no lower halves. Does 
that turn into six replacement characters or one? Both interpretations 
seem possible.

I suppose I prefer six rather than one, but I don't care a great deal as 
  long as this is locked down one way or the other.

-- 
?Elliotte Rusty Harold  elharo at metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Received on Friday, 3 November 2006 03:52:17 UTC