- From: Glenn Adams <glenn@skynav.com>
- Date: Mon, 16 Jan 2012 13:54:21 -0700
- To: Boris Zbarsky <bzbarsky@mit.edu>
- Cc: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
- Message-ID: <CACQ=j+e+2sAQUG5oRC2Ta4pS3J+BF63gBZnqVU3F5froSJj=5w@mail.gmail.com>
On Mon, Jan 16, 2012 at 1:39 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote: > On 1/16/12 3:06 PM, Glenn Adams wrote: > >> (2) script that naively assumes codepoint = character, and inadvertently >> separates surrogate pair elements; >> > > This is most script. > > regarding (2), my position is that the implementation should be >> conservative and not liberal when allowing script to set certain >> property values whose underlying semantics imply a well-formed UTF-16 >> string; so, yes, were I implementing this, I would throw an exception >> when a script attempts to set a DOMString typed property to a JS String >> that contains an isolated surrogate codepoint; or at least I would do >> this by default, and only depart from this default in certain >> circumscribed cases; >> > > And my point is that since pretty much every script handles surrogate > pairs wrong throwing would just penalize users who try to use non-BMP > characters with such scripts. It would particular penalize users whose > languages are written with non-BMP characters. > > Maybe you think it's OK to screw such users over. I don't. Boris, why do you use language like this? It is not conducive to a technical dialog. As one of the authors of Unicode, I find it rather ironic to be accused in this manner. > Especially in situations in which the "correct" rendering is obvious (e.g. > every single codepoint wrapped in its own span, but all have the same > style: you just render the text as a single text string with that style). > this is my answer to your question "what the best way to limit damage >> from the lack of understanding on script authors' part is" >> > > I think you and I have different definitions of "damage" here. Apparently. I view the damage of proliferating non-well formed content, and thus non-interoperable content, and the damage of proliferating potential security holes to be of greater consequence than requiring script authors to address the consequences of working with non-BMP characters encoded in UTF-16. In other words, I conclude just the opposite, that users of non-BMP characters are penalized *more* by implementations that munge surrogate pairs than implementations that enforce correct handling.
Received on Monday, 16 January 2012 20:55:15 UTC