- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 25 Jan 2013 00:48:10 +0100
- To: Simon Sapin <simon.sapin@kozea.fr>
- Cc: www-style list <www-style@w3.org>
* Simon Sapin wrote: >This would address the current definition being "wrong" but not what I >really want. Which is being able to implement a conforming tokenizer >that, for efficiency, pretends that UTF-8 bytes are code points. Tokenizing a typical style sheet on typical hardware should take less than 1 ms (per http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ UTF-8 can be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of around 250 MB per second; if you make that 100 MB per second and put the typical size of a style sheet at 100 KB, you would still be under 1 ms, if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently similar to tokenizing UTF-8 encoded style sheets for this discussion). -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 24 January 2013 23:48:37 UTC