Re: [css3-syntax] Making U+0080 to U+009F "non-ASCII"?

* Simon Sapin wrote:
>This would address the current definition being "wrong" but not what I 
>really want. Which is being able to implement a conforming tokenizer 
>that, for efficiency, pretends that UTF-8 bytes are code points.

Tokenizing a typical style sheet on typical hardware should take less
than 1 ms (per http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ UTF-8 can
be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of
around 250 MB per second; if you make that 100 MB per second and put the
typical size of a style sheet at 100 KB, you would still be under 1 ms,
if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently
similar to tokenizing UTF-8 encoded style sheets for this discussion).
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Thursday, 24 January 2013 23:48:37 UTC