- From: <bugzilla@jessica.w3.org>
- Date: Tue, 26 Nov 2013 18:06:11 +0000
- To: www-international@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23927
--- Comment #6 from Addison Phillips <addison@lab126.com> ---
You're probably right about not being able to get to the UTF-16 encoder
directly. I'm trying to think of cases and the only one that occurs to me out
of hand would be reading data into a JS string? Or maybe writing an XML
document (**NOT** XHTML, please note).
A UTF-16 encoder should deal with non-Unicode-scalar-value input: that is one
of its edge conditions. Bad data exists everywhere and the failure conditions
should be well-described. It's easy enough to chop a UTF-16 buffer between two
surrogate code points (if your code is surrogate stupid). Similarly someone
might use it as a form of attack ("?" has a meaning in syntaxes such as URL but
U+D800 might look like a tofu box and not arouse suspicion).
In any case, don't you agree that the "error" instructions are for
ASCII-compatible encodings and, as written, aren't quite right for a UTF-16
encoder? If you changed the word "byte" to "code unit", that might fix it (at
the cost of confusion for all other encodings).
--
You are receiving this mail because:
You are on the CC list for the bug.
Received on Tuesday, 26 November 2013 18:06:13 UTC