Re: [css3-syntax] Making U+0080 to U+009F "non-ASCII"? from Bjoern Hoehrmann on 2013-01-24 (www-style@w3.org from January 2013)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 25 Jan 2013 00:48:10 +0100
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: www-style list <www-style@w3.org>
Message-ID: <6je3g8pq5u67fu8493gppc017f0iopoibt@hive.bjoern.hoehrmann.de>

* Simon Sapin wrote:
>This would address the current definition being "wrong" but not what I 
>really want. Which is being able to implement a conforming tokenizer 
>that, for efficiency, pretends that UTF-8 bytes are code points.

Tokenizing a typical style sheet on typical hardware should take less
than 1 ms (per http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ UTF-8 can
be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of
around 250 MB per second; if you make that 100 MB per second and put the
typical size of a style sheet at 100 KB, you would still be under 1 ms,
if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently
similar to tokenizing UTF-8 encoded style sheets for this discussion).
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Thursday, 24 January 2013 23:48:37 UTC