- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 24 Sep 2009 09:06:13 +0000 (UTC)
On Thu, 17 Sep 2009, ?istein E. Andersen wrote: > > It is much clearer now. Thanks. Just a few minor issues: > > > "Bytes or sequences of bytes in the original byte stream that could not be > > converted to Unicode characters must be converted to U+FFFD REPLACEMENT > > CHARACTER code points." > > With the new definition of Unicode characters as Unicode scalar values, this > excludes surrogate code points, which are also handled separately (and cause a > parse error) in the step quoted below. You may want to say "Unicode code > points" rather than "Unicode characters". Fixed. > "U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere and probably > reads better than "U+FFFD REPLACEMENT CHARACTER code points". Fixed. > > All U+0000 NULL characters and code points in the range U+D800 to U+DFFF in > > the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences > > of such characters and code points are parse errors. > > The phrase "characters and code points" (in the second sentence) is awkward > given that all characters are in fact code points. Yeah, but if I change it it sounds even more awkward because then it doesn't match the previous sentence. I'd rather have it be technically redundant than confuse people into thinking that I meant something more subtle than the spec actually says. Cheers, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 24 September 2009 02:06:13 UTC