- From: Simon Sapin <simon.sapin@kozea.fr>
- Date: Fri, 25 Jan 2013 09:35:41 +0100
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- CC: www-style list <www-style@w3.org>
Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit : > * Simon Sapin wrote: >> This would address the current definition being "wrong" but not what I >> really want. Which is being able to implement a conforming tokenizer >> that, for efficiency, pretends that UTF-8 bytes are code points. > Tokenizing a typical style sheet on typical hardware should take less > than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/ UTF-8 can > be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of > around 250 MB per second; if you make that 100 MB per second and put the > typical size of a style sheet at 100 KB, you would still be under 1 ms, > if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently > similar to tokenizing UTF-8 encoded style sheets for this discussion). Ok, I admit this is probably premature optimization and not worth the compat risk. -- Simon Sapin
Received on Friday, 25 January 2013 08:36:27 UTC