Re: [css3-syntax] Making U+0080 to U+009F "non-ASCII"? from Simon Sapin on 2013-01-25 (www-style@w3.org from January 2013)

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Fri, 25 Jan 2013 09:35:41 +0100
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: www-style list <www-style@w3.org>
Message-ID: <510243DD.4080705@kozea.fr>

Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit :
> * Simon Sapin wrote:
>> This would address the current definition being "wrong" but not what I
>> really want. Which is being able to implement a conforming tokenizer
>> that, for efficiency, pretends that UTF-8 bytes are code points.
> Tokenizing a typical style sheet on typical hardware should take less
> than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/  UTF-8 can
> be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of
> around 250 MB per second; if you make that 100 MB per second and put the
> typical size of a style sheet at 100 KB, you would still be under 1 ms,
> if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently
> similar to tokenizing UTF-8 encoded style sheets for this discussion).

Ok, I admit this is probably premature optimization and not worth the 
compat risk.

-- 
Simon Sapin

Received on Friday, 25 January 2013 08:36:27 UTC