W3C home > Mailing lists > Public > www-style@w3.org > January 2013

Re: [css3-syntax] Making U+0080 to U+009F "non-ASCII"?

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Fri, 25 Jan 2013 10:24:37 -0800
Message-ID: <CAAWBYDCK9N5v5JMNPm5TyoXedjUZZdBWRnG71U5RGfrB24xr-A@mail.gmail.com>
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-style list <www-style@w3.org>
On Fri, Jan 25, 2013 at 12:35 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit :
>> * Simon Sapin wrote:
>>>
>>> This would address the current definition being "wrong" but not what I
>>> really want. Which is being able to implement a conforming tokenizer
>>> that, for efficiency, pretends that UTF-8 bytes are code points.
>>
>> Tokenizing a typical style sheet on typical hardware should take less
>> than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/  UTF-8 can
>>
>> be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of
>> around 250 MB per second; if you make that 100 MB per second and put the
>> typical size of a style sheet at 100 KB, you would still be under 1 ms,
>> if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently
>> similar to tokenizing UTF-8 encoded style sheets for this discussion).
>
>
> Ok, I admit this is probably premature optimization and not worth the compat
> risk.

I suspect it's approximately zero compat risk.  I'm willing to make
the change iff other browsers are cool with it.  I'd make the change
in WebKit, but I can't make heads nor tails of our lexer.

~TJ
Received on Friday, 25 January 2013 18:25:27 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:21:04 GMT