- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 16 Jan 2012 16:11:20 -0500
- To: Glenn Adams <glenn@skynav.com>
- CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On 1/16/12 3:54 PM, Glenn Adams wrote: > And my point is that since pretty much every script handles > surrogate pairs wrong throwing would just penalize users who try to > use non-BMP characters with such scripts. It would particular > penalize users whose languages are written with non-BMP characters. > > Maybe you think it's OK to screw such users over. I don't. > > > Boris, why do you use language like this? It is not conducive to a > technical dialog. As one of the authors of Unicode, I find it rather > ironic to be accused in this manner. I'm not accusing you of anything. I'm just describing the most likely outcome of the proposed browser behavior. If you don't mean for that outcome to happen, then I think we need a different proposal.... > Apparently. I view the damage of proliferating non-well formed content, > and thus non-interoperable content, and the damage of proliferating > potential security holes to be of greater consequence than requiring > script authors to address the consequences of working with non-BMP > characters encoded in UTF-16. The problem is that you think the burden of the pain here will fall on authors. It won't. It'll fall on users. So the question is whether "proliferating non-well formed content" (whatever that even means in this context, since the DOM and ECMAScript don't really have such concepts) is a worse thing than locking some users out of using parts of the web because they happen to communicate in a language that can't be written down using the BMP. So yes, I think we have different definitions of "damage": you're interested in damage to data integrity and perhaps authors having to spend more time on scripts while I'm more worried in damage to the ability of users to communicate on the web (as it is, not as it ought to be; if the web were different we would not be having this discussion) in languages of their choice. > In other words, I conclude just the > opposite, that users of non-BMP characters are penalized *more* by > implementations that munge surrogate pairs than implementations that > enforce correct handling. I don't see how you can possibly reach this conclusion, again given the premise of the web as it is, not as we'd prefer it to be.... -Boris
Received on Monday, 16 January 2012 21:11:49 UTC