- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Fri, 25 Jan 2013 10:24:37 -0800
- To: Simon Sapin <simon.sapin@kozea.fr>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-style list <www-style@w3.org>
On Fri, Jan 25, 2013 at 12:35 AM, Simon Sapin <simon.sapin@kozea.fr> wrote: > Le 25/01/2013 00:48, Bjoern Hoehrmann a écrit : >> * Simon Sapin wrote: >>> >>> This would address the current definition being "wrong" but not what I >>> really want. Which is being able to implement a conforming tokenizer >>> that, for efficiency, pretends that UTF-8 bytes are code points. >> >> Tokenizing a typical style sheet on typical hardware should take less >> than 1 ms (perhttp://bjoern.hoehrmann.de/utf-8/decoder/dfa/ UTF-8 can >> >> be transcoded to UTF-16 on > 8 years old, low-end hardware at a rate of >> around 250 MB per second; if you make that 100 MB per second and put the >> typical size of a style sheet at 100 KB, you would still be under 1 ms, >> if you accept that transcoding UTF-8 to UTF-16 in memory is sufficiently >> similar to tokenizing UTF-8 encoded style sheets for this discussion). > > > Ok, I admit this is probably premature optimization and not worth the compat > risk. I suspect it's approximately zero compat risk. I'm willing to make the change iff other browsers are cool with it. I'd make the change in WebKit, but I can't make heads nor tails of our lexer. ~TJ
Received on Friday, 25 January 2013 18:25:27 UTC