Re: [css3-syntax] Making U+0080 to U+009F "non-ASCII"? from Tab Atkins Jr. on 2013-01-25 (www-style@w3.org from January 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Fri, 25 Jan 2013 10:24:37 -0800
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-style list <www-style@w3.org>
Message-ID: <CAAWBYDCK9N5v5JMNPm5TyoXedjUZZdBWRnG71U5RGfrB24xr-A@mail.gmail.com>

On Fri, Jan 25, 2013 at 12:35 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit :
>> * Simon Sapin wrote:
>>>
>>> This would address the current definition being "wrong" but not what I
>>> really want. Which is being able to implement a conforming tokenizer
>>> that, for efficiency, pretends that UTF-8 bytes are code points.
>>
>> Tokenizing a typical style sheet on typical hardware should take less
>> than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/  UTF-8 can
>>
>> be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of
>> around 250 MB per second; if you make that 100 MB per second and put the
>> typical size of a style sheet at 100 KB, you would still be under 1 ms,
>> if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently
>> similar to tokenizing UTF-8 encoded style sheets for this discussion).
>
>
> Ok, I admit this is probably premature optimization and not worth the compat
> risk.

I suspect it's approximately zero compat risk.  I'm willing to make
the change iff other browsers are cool with it.  I'd make the change
in WebKit, but I can't make heads nor tails of our lexer.

~TJ

Received on Friday, 25 January 2013 18:25:27 UTC