Re: [css-syntax] Defining "character" from Zack Weinberg on 2013-08-12 (www-style@w3.org from August 2013)

From: Zack Weinberg <zackw@panix.com>
Date: Mon, 12 Aug 2013 13:09:47 -0700
To: Simon Pieters <simonp@opera.com>
Cc: Simon Sapin <simon.sapin@exyr.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, www-style list <www-style@w3.org>
Message-ID: <CAKCAbMhi9rpPjmNc-7cGhzSrt7t4_E495Nojz7L_pf4Q8NQfQQ@mail.gmail.com>

On Mon, Aug 12, 2013 at 11:47 AM, Simon Pieters <simonp@opera.com> wrote:
> On Mon, 12 Aug 2013 19:36:37 +0200, Tab Atkins Jr. <jackalmage@gmail.com>
> wrote:
>>
>> If implementations are willing to change, I'm fine with specifying
>> that unpaired surrogates get transformed into U+FFFD at CSS parse
>> time.

I wouldn't hesitate to make that change in Gecko.  We use UTF-16
internally for everything (alas), so it would be a little fiddly, but
not *that* fiddly.

> Doing that seems like a slight perf cost and basically no benefit. The DOM
> API and document.write in HTML just let lone surrogates through. I'd say we
> do that in CSS for stuff coming from CSSOM also.

Is that intentional in HTML5 or just an oversight?  If it's
intentional, I suppose we ought to do the same for overall
consistency's sake.

zw

Received on Monday, 12 August 2013 20:10:10 UTC