Re: [css3-text] grapheme clusters across element boundary from Boris Zbarsky on 2012-01-16 (www-style@w3.org from January 2012)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 16 Jan 2012 16:11:20 -0500
To: Glenn Adams <glenn@skynav.com>
CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
Message-ID: <4F149278.20808@mit.edu>

On 1/16/12 3:54 PM, Glenn Adams wrote:
>     And my point is that since pretty much every script handles
>     surrogate pairs wrong throwing would just penalize users who try to
>     use non-BMP characters with such scripts.  It would particular
>     penalize users whose languages are written with non-BMP characters.
>
>     Maybe you think it's OK to screw such users over.  I don't.
>
>
> Boris, why do you use language like this? It is not conducive to a
> technical dialog. As one of the authors of Unicode, I find it rather
> ironic to be accused in this manner.

I'm not accusing you of anything.  I'm just describing the most likely 
outcome of the proposed browser behavior.  If you don't mean for that 
outcome to happen, then I think we need a different proposal....

> Apparently. I view the damage of proliferating non-well formed content,
> and thus non-interoperable content, and the damage of proliferating
> potential security holes to be of greater consequence than requiring
> script authors to address the consequences of working with non-BMP
> characters encoded in UTF-16.

The problem is that you think the burden of the pain here will fall on 
authors.  It won't.  It'll fall on users.

So the question is whether "proliferating non-well formed content" 
(whatever that even means in this context, since the DOM and ECMAScript 
don't really have such concepts) is a worse thing than locking some 
users out of using parts of the web because they happen to communicate 
in a language that can't be written down using the BMP.

So yes, I think we have different definitions of "damage": you're 
interested in damage to data integrity and perhaps authors having to 
spend more time on scripts while I'm more worried in damage to the 
ability of users to communicate on the web (as it is, not as it ought to 
be; if the web were different we would not be having this discussion) in 
languages of their choice.

> In other words, I conclude just the
> opposite, that users of non-BMP characters are penalized *more* by
> implementations that munge surrogate pairs than implementations that
> enforce correct handling.

I don't see how you can possibly reach this conclusion, again given the 
premise of the web as it is, not as we'd prefer it to be....

-Boris

Received on Monday, 16 January 2012 21:11:49 UTC