- From: Řistein E. Andersen <liszt@coq.no>
- Date: Thu, 17 Sep 2009 01:38:05 +0100
It is much clearer now. Thanks. Just a few minor issues: > "Bytes or sequences of bytes in the original byte stream that could > not be converted to Unicode characters must be converted to U+FFFD > REPLACEMENT CHARACTER code points." With the new definition of Unicode characters as Unicode scalar values, this excludes surrogate code points, which are also handled separately (and cause a parse error) in the step quoted below. You may want to say "Unicode code points" rather than "Unicode characters". "U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere and probably reads better than "U+FFFD REPLACEMENT CHARACTER code points". > All U+0000 NULL characters and code points in the range U+D800 to U > +DFFF in the input must be replaced by U+FFFD REPLACEMENT > CHARACTERs. Any occurrences of such characters and code points are > parse errors. > The phrase "characters and code points" (in the second sentence) is awkward given that all characters are in fact code points. -- ?istein E. Andersen
Received on Wednesday, 16 September 2009 17:38:05 UTC