- From: L. David Baron <dbaron@dbaron.org>
- Date: Sun, 1 Feb 2009 18:57:55 -0800
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: public-i18n-core@w3.org, www-style@w3.org
On Sunday 2009-02-01 11:13 -0500, Boris Zbarsky wrote: > One question I have is whether this issue would be resolved if a UA > performed parse-time normalization on everything (JS, CSS, XML, HTML). > That wouldn't completely help JS because you can build up strings > codepoint-by-codepoint but that also lets you create invalid UTF-16 > strings, so I'm not sure it's worth worrying about right now. I think parse-time normalization of everything, as Boris describes, is the only reasonable solution here if we decide Unicode normalization is important. It solves the problem all at once, without having to worry about changing the rules for selector matching, tons of distinct DOM APIs, etc., etc. (It might cause a little pain to those using escaped characters, e.g. numeric character references in HTML/XML, escaped codepoints in CSS or JS, but that pain would largely be transitional, if they're depending on matching whichever normalization is considered non-canonical.) It also happens once, probably right after character encoding conversion happens, in the process where we receive bytes from the network, convert the bytes into an internal character representation (typically UTF-8 or UTF-16) according to the encoding, and then parse that. Sticking normalization into the process of selector matching would solve only a small part of the problem, and would stick an extra step into a process that's already fundamentally quadratic computational complexity (selectors * elements). -David -- L. David Baron http://dbaron.org/ Mozilla Corporation http://www.mozilla.com/
Received on Monday, 2 February 2009 02:58:51 UTC