[Bug 23927] ASCII-incompatible encoder error handling from bugzilla@jessica.w3.org on 2013-11-26 (www-international@w3.org from October to December 2013)

From: <bugzilla@jessica.w3.org>
Date: Tue, 26 Nov 2013 18:06:11 +0000
To: www-international@w3.org
Message-ID: <bug-23927-4285-Ntmpp4OR1f@http.www.w3.org/Bugs/Public/>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=23927

--- Comment #6 from Addison Phillips <addison@lab126.com> ---
You're probably right about not being able to get to the UTF-16 encoder
directly. I'm trying to think of cases and the only one that occurs to me out
of hand would be reading data into a JS string? Or maybe writing an XML
document (**NOT** XHTML, please note).

A UTF-16 encoder should deal with non-Unicode-scalar-value input: that is one
of its edge conditions. Bad data exists everywhere and the failure conditions
should be well-described. It's easy enough to chop a UTF-16 buffer between two
surrogate code points (if your code is surrogate stupid). Similarly someone
might use it as a form of attack ("?" has a meaning in syntaxes such as URL but
U+D800 might look like a tofu box and not arouse suspicion).

In any case, don't you agree that the "error" instructions are for
ASCII-compatible encodings and, as written, aren't quite right for a UTF-16
encoder? If you changed the word "byte" to "code unit", that might fix it (at
the cost of confusion for all other encodings).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Tuesday, 26 November 2013 18:06:13 UTC