[whatwg] Surrogate pairs and character references from Ian Hickson on 2009-09-24 (public-whatwg-archive@w3.org from September 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 24 Sep 2009 09:06:13 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0909240902450.15471@hixie.dreamhostps.com>

On Thu, 17 Sep 2009, ?istein E. Andersen wrote:
>
> It is much clearer now.  Thanks.  Just a few minor issues:
> 
> > "Bytes or sequences of bytes in the original byte stream that could not be
> > converted to Unicode characters must be converted to U+FFFD REPLACEMENT
> > CHARACTER code points."
> 
> With the new definition of Unicode characters as Unicode scalar values, this
> excludes surrogate code points, which are also handled separately (and cause a
> parse error) in the step quoted below.  You may want to say "Unicode code
> points" rather than "Unicode characters".

Fixed.


> "U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere and probably
> reads better than "U+FFFD REPLACEMENT CHARACTER code points".

Fixed.


> > All U+0000 NULL characters and code points in the range U+D800 to U+DFFF in
> > the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences
> > of such characters and code points are parse errors.
> 
> The phrase "characters and code points" (in the second sentence) is awkward
> given that all characters are in fact code points.

Yeah, but if I change it it sounds even more awkward because then it 
doesn't match the previous sentence. I'd rather have it be technically 
redundant than confuse people into thinking that I meant something more 
subtle than the spec actually says.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 24 September 2009 02:06:13 UTC