- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 16 Jan 2012 12:02:46 -0500
- To: Glenn Adams <glenn@skynav.com>
- CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On 1/16/12 10:10 AM, Glenn Adams wrote: > it is certainly questionable authoring for a surrogate pair to be > separated by a element boundary; if it appeared in actual input text, it > would certainly not be well-formed UTF-16; i would prefer a browser to > translate (or interpret) each member of the pair in the following > example to (as) the replacement character (\ufffd). Consider the simple case HTML that has some string containing non-BMP characters and then a script that takes the text and wraps each "character" (which from the point of view of JS means each codepoint) in a <span>. These are not that uncommon, by the way. Oh, and they commonly run on user input, not on data provided by the site itself. Would you really expect the browser to convert some of the codepoints to \ufffd when the script does that? That seems like it would violate the principle of least surprise. It would also mean that the script in question would work just fine in initial testing then break as soon as a user entered some non-BMP characters. I think we owe it to users to make this case work as Gecko does. -Boris
Received on Monday, 16 January 2012 17:03:19 UTC