W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-text] grapheme clusters across element boundary

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 16 Jan 2012 12:02:46 -0500
Message-ID: <4F145836.9050000@mit.edu>
To: Glenn Adams <glenn@skynav.com>
CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On 1/16/12 10:10 AM, Glenn Adams wrote:
> it is certainly questionable authoring for a surrogate pair to be
> separated by a element boundary; if it appeared in actual input text, it
> would certainly not be well-formed UTF-16; i would prefer a browser to
> translate (or interpret) each member of the pair in the following
> example to (as) the replacement character (\ufffd).

Consider the simple case HTML that has some string containing non-BMP 
characters and then a script that takes the text and wraps each 
"character" (which from the point of view of JS means each codepoint) in 
a <span>.  These are not that uncommon, by the way.  Oh, and they 
commonly run on user input, not on data provided by the site itself.

Would you really expect the browser to convert some of the codepoints to 
\ufffd when the script does that?  That seems like it would violate the 
principle of least surprise.  It would also mean that the script in 
question would work just fine in initial testing then break as soon as a 
user entered some non-BMP characters.

I think we owe it to users to make this case work as Gecko does.

Received on Monday, 16 January 2012 17:03:19 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:54 UTC