Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit : > * Simon Sapin wrote: >> This would address the current definition being "wrong" but not what I >> really want. Which is being able to implement a conforming tokenizer >> that, for efficiency, pretends that UTF-8 bytes are code points. > Tokenizing a typical style sheet on typical hardware should take less > than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/ UTF-8 can > be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of > around 250 MB per second; if you make that 100 MB per second and put the > typical size of a style sheet at 100 KB, you would still be under 1 ms, > if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently > similar to tokenizing UTF-8 encoded style sheets for this discussion). Ok, I admit this is probably premature optimization and not worth the compat risk. -- Simon SapinReceived on Friday, 25 January 2013 08:36:27 UTC
This archive was generated by hypermail 2.4.0 : Monday, 23 January 2023 02:14:23 UTC