Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization from Anne van Kesteren on 2009-02-06 (www-style@w3.org from February 2009)

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 06 Feb 2009 15:39:06 +0100
To: "Aryeh Gregor" <Simetrical+w3c@gmail.com>
Cc: "David Clarke" <w3@dragonthoughts.co.uk>, "Henri Sivonen" <hsivonen@iki.fi>, public-i18n-core@w3.org, "W3C Style List" <www-style@w3.org>
Message-ID: <op.uoxpnga164w2qv@annevk-t60.oslo.opera.com>

On Fri, 06 Feb 2009 15:27:26 +0100, Aryeh Gregor  
<Simetrical+w3c@gmail.com> wrote:
> On Fri, Feb 6, 2009 at 5:50 AM, Anne van Kesteren <annevk@opera.com>  
> wrote:
>> Several people seem to have the assumption that there would only be
>> performance impact for non-normalized data. That is not true, for one
>> because an additional check has to be made to see whether data is  
>> normalized
>> in the first place. (And as I said right at the start of this thread,
>> milliseconds do matter.)
>
> Do you think there would be a significant impact on performance if the
> input was just normalized as it's read, instead of being normalized on
> comparison?

I don't know. Developers from Gecko seem to think that is at least  
something worth considering, but it does not appear to be what the i18n  
guys (for lack of a better term, I'm sure we all care about i18n) want. (I  
note I did ask for research here, but nobody so far has come forward with  
numbers.)


> Also, it was said that Gecko interns strings -- could it
> normalize them right before interning, so that subsequent comparisons
> are still just pointer comparisons?

That would not help with strings that are not atomized. E.g.

   function search(s) {
     ...
     return node.textContent == s
   }

or (a feature that might be added to Selectors Level 4):

   :contains("...") { ... }

(I might be wrong here, since I do not work on Gecko.)


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Friday, 6 February 2009 14:40:04 UTC