- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 16 Jan 2012 17:20:35 -0500
- To: Glenn Adams <glenn@skynav.com>
- CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On 1/16/12 5:09 PM, Glenn Adams wrote: > If scripts cause exceptions, then users will complain or not use the > product (content). This is the theory. It doesn't work that way in practice. Users are more likely to complain to the browser vendor and/or switch to a different browser. > If script authors care, they will make their scripts > work; if they don't care, i don't know how an implementation is going to > fix it for the end users. Because for simple values of not caring it's not that hard to do "the right thing". I'm not saying we can work around all kinds of conceivable author braindamage wrt surrogates, but we don't need to. We only need to work around the things authors do in practice. > So the question is whether "proliferating non-well formed content" > (whatever that even means in this context, since the DOM and > ECMAScript don't really have such concepts) is a worse thing than > locking some users out of using parts of the web because they happen > to communicate in a language that can't be written down using the BMP. > > ECMA-262 3rd Edition Section 2 Conformance states: > > "A conforming implementation of this International standard shall > interpret characters in conformance with the Yes, but that doesn't apply to the content of JS String objects so much. Those are not characters as far as ES is concerned; they're just arrays of 2-byte integers. At least until you try to do some sort of character-related things like regular expressions and whatnot on it. > It is pretty clear to me that a conforming implementation of the above > will not (or should not) permit e.textContent="\ud834" to complete > without throwing some exception, at least without willfully violating > these conformance requirements. "\ud834" could perhaps throw an exception in ES, though you should check the actual processing model defined for \u escapes to make sure. But if it does NOT, then what you have on the right-hand side of that assignment is an array of 16-bit integers, not a Unicode string. And what happens on assignment is defined by the DOM spec, which likewise doesn't refer to Unicode strings (DOMString is defined to be an array of arbitrary 16-bit integers, just like ES strings). > If DOM-4 does not make this clear, then perhaps it should. This would be a behavior change from existing UAs that would break the web, as far as I can see. It's certainly not how the DOM has worked so far. If you do want such a spec change, go for it, but I doubt UAs can do this. See "break the web". -Boris
Received on Monday, 16 January 2012 22:21:06 UTC