Re: [css-syntax] Defining "character" from Tab Atkins Jr. on 2013-08-12 (www-style@w3.org from August 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 12 Aug 2013 10:36:37 -0700
To: Simon Sapin <simon.sapin@exyr.org>
Cc: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDB=5=eGETY6ZQj4NWMzQU+092uQ8xtQ_=OsCHBgTP7OzA@mail.gmail.com>

On Mon, Aug 12, 2013 at 9:59 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> Le 12/08/2013 17:25, Zack Weinberg a écrit :
>> On Mon, Aug 12, 2013 at 7:35 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
>>>
>>>
>>> data:text/html,<style>body:before{}</style><script>document.styleSheets[0].cssRules[0].style.content="'-\ud834\udd1e-'"</script>
>>
>>
>> That JavaScript strings expose surrogate pairs to the programmer is a
>> (unfixable due to backward compatibility) specification bug in
>> JavaScript, which should not infect CSS; the behavior on our side
>> should IMHO be as-if the surrogate pair is converted to the
>> corresponding code point before tokenization, such that the modified
>> style sheet is indistinguishable from the one produced by
>>
>> data:text/html,<style>body:before{content:'-\01d11e -'}</style>
>
>
> Yes. That’s fine: surrogate pairs are how you’re supposed to do non-BMP
> codepoints in Javascript. The trouble is with unpaired surrogates:
>
> data:text/html,<style>body:before{}</style><script>document.styleSheets[0].cssRules[0].style.content="'-\ud834-\udd1e-'"</script>

If implementations are willing to change, I'm fine with specifying
that unpaired surrogates get transformed into U+FFFD at CSS parse
time.

~TJ

Received on Monday, 12 August 2013 17:37:31 UTC