Re: [css3-syntax] Making U+0080 to U+009F "non-ASCII"?

Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit :
> * Simon Sapin wrote:
>> This would address the current definition being "wrong" but not what I
>> really want. Which is being able to implement a conforming tokenizer
>> that, for efficiency, pretends that UTF-8 bytes are code points.
> Tokenizing a typical style sheet on typical hardware should take less
> than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/  UTF-8 can
> be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of
> around 250 MB per second; if you make that 100 MB per second and put the
> typical size of a style sheet at 100 KB, you would still be under 1 ms,
> if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently
> similar to tokenizing UTF-8 encoded style sheets for this discussion).

Ok, I admit this is probably premature optimization and not worth the 
compat risk.

-- 
Simon Sapin

Received on Friday, 25 January 2013 08:36:27 UTC