- From: Simon Pieters <simonp@opera.com>
- Date: Tue, 13 Aug 2013 11:28:19 +0200
- To: "Zack Weinberg" <zackw@panix.com>
- Cc: "Simon Sapin" <simon.sapin@exyr.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, "www-style list" <www-style@w3.org>
On Mon, 12 Aug 2013 22:09:47 +0200, Zack Weinberg <zackw@panix.com> wrote: > On Mon, Aug 12, 2013 at 11:47 AM, Simon Pieters <simonp@opera.com> wrote: >> On Mon, 12 Aug 2013 19:36:37 +0200, Tab Atkins Jr. >> <jackalmage@gmail.com> >> wrote: >>> >>> If implementations are willing to change, I'm fine with specifying >>> that unpaired surrogates get transformed into U+FFFD at CSS parse >>> time. > > I wouldn't hesitate to make that change in Gecko. We use UTF-16 > internally for everything (alas), so it would be a little fiddly, but > not *that* fiddly. > >> Doing that seems like a slight perf cost and basically no benefit. The >> DOM >> API and document.write in HTML just let lone surrogates through. I'd >> say we >> do that in CSS for stuff coming from CSSOM also. > > Is that intentional in HTML5 or just an oversight? If it's > intentional, I suppose we ought to do the same for overall > consistency's sake. It is intentional. The HTML spec's parser actually previously operated on code points, but that was never a reality in implementations, and at least Henri Sivonen refused to implement it in Gecko's HTML parser [1], so the spec changed to let lone surrogates from document.write through. [1] http://lists.w3.org/Archives/Public/public-whatwg-archive/2011Nov/0020.html -- Simon Pieters Opera Software
Received on Tuesday, 13 August 2013 09:23:29 UTC