W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-text] grapheme clusters across element boundary

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 16 Jan 2012 17:20:35 -0500
Message-ID: <4F14A2B3.8070608@mit.edu>
To: Glenn Adams <glenn@skynav.com>
CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On 1/16/12 5:09 PM, Glenn Adams wrote:
> If scripts cause exceptions, then users will complain or not use the
> product (content).

This is the theory.  It doesn't work that way in practice.  Users are 
more likely to complain to the browser vendor and/or switch to a 
different browser.

> If script authors care, they will make their scripts
> work; if they don't care, i don't know how an implementation is going to
> fix it for the end users.

Because for simple values of not caring it's not that hard to do "the 
right thing".  I'm not saying we can work around all kinds of 
conceivable author braindamage wrt surrogates, but we don't need to.  We 
only need to work around the things authors do in practice.

>     So the question is whether "proliferating non-well formed content"
>     (whatever that even means in this context, since the DOM and
>     ECMAScript don't really have such concepts) is a worse thing than
>     locking some users out of using parts of the web because they happen
>     to communicate in a language that can't be written down using the BMP.
>
> ECMA-262 3rd Edition Section 2 Conformance states:
>
> "A conforming implementation of this International standard shall
> interpret characters in conformance with the

Yes, but that doesn't apply to the content of JS String objects so much. 
  Those are not characters as far as ES is concerned; they're just 
arrays of 2-byte integers.  At least until you try to do some sort of 
character-related things like regular expressions and whatnot on it.

> It is pretty clear to me that a conforming implementation of the above
> will not (or should not) permit e.textContent="\ud834" to complete
> without throwing some exception, at least without willfully violating
> these conformance requirements.

"\ud834" could perhaps throw an exception in ES, though you should check 
the actual processing model defined for \u escapes to make sure.

But if it does NOT, then what you have on the right-hand side of that 
assignment is an array of 16-bit integers, not a Unicode string.  And 
what happens on assignment is defined by the DOM spec, which likewise 
doesn't refer to Unicode strings (DOMString is defined to be an array of 
arbitrary 16-bit integers, just like ES strings).

> If DOM-4 does not make this clear, then perhaps it should.

This would be a behavior change from existing UAs that would break the 
web, as far as I can see.  It's certainly not how the DOM has worked so far.

If you do want such a spec change, go for it, but I doubt UAs can do 
this.  See "break the web".

-Boris
Received on Monday, 16 January 2012 22:21:06 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:48 GMT