- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 15 Jun 2007 00:25:05 +0000 (UTC)
On Fri, 3 Nov 2006, Elliotte Harold wrote: > > Section 9.2.2 of the current Web Apps 1.0 draft states: > > Bytes or sequences of bytes in the original byte stream that could not > be converted to Unicode characters must be converted to U+FFFD > REPLACEMENT CHARACTER code points. > > I'm concerned about the "or". For example, suppose there are six upper > halves of a Unicode surrogate pair in a row and no lower halves. Does > that turn into six replacement characters or one? Both interpretations > seem possible. > > I suppose I prefer six rather than one, but I don't care a great deal as > long as this is locked down one way or the other. I don't really know how to define this. I'd like to say that it's up to the encoding specifications to define it. Any suggestions? -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 14 June 2007 17:25:05 UTC