W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-text] grapheme clusters across element boundary

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 16 Jan 2012 15:39:28 -0500
Message-ID: <4F148B00.20706@mit.edu>
To: Glenn Adams <glenn@skynav.com>
CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On 1/16/12 3:06 PM, Glenn Adams wrote:
> (2) script that naively assumes codepoint = character, and inadvertently
> separates surrogate pair elements;

This is most script.

> regarding (2), my position is that the implementation should be
> conservative and not liberal when allowing script to set certain
> property values whose underlying semantics imply a well-formed UTF-16
> string; so, yes, were I implementing this, I would throw an exception
> when a script attempts to set a DOMString typed property to a JS String
> that contains an isolated surrogate codepoint; or at least I would do
> this by default, and only depart from this default in certain
> circumscribed cases;

And my point is that since pretty much every script handles surrogate 
pairs wrong throwing would just penalize users who try to use non-BMP 
characters with such scripts.  It would particular penalize users whose 
languages are written with non-BMP characters.

Maybe you think it's OK to screw such users over.  I don't.  Especially 
in situations in which the "correct" rendering is obvious (e.g. every 
single codepoint wrapped in its own span, but all have the same style: 
you just render the text as a single text string with that style).

> this is my answer to your question "what the best way to limit damage
> from the lack of understanding on script authors' part is"

I think you and I have different definitions of "damage" here.

-Boris
Received on Monday, 16 January 2012 20:39:57 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:48 GMT