RE: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

(chair hat OFF)
> 
> I don't know. Developers from Gecko seem to think that is at least
> something worth considering, but it does not appear to be what the
> i18n guys (for lack of a better term, I'm sure we all care about i18n)
> want. (I note I did ask for research here, but nobody so far has come
> forward with numbers.)

It *is* emphatically something worth considering. I don't believe one can categorically say that the "i18n guys don't want" normalization as, for example, part of the parsing process. However, our focus (at least in the I18N WG) is on what behavior CSS Selectors should have.

The caution/concern that I would offer here is that automatic normalization opens several potential issues, such as the interaction between JavaScript and the normalized document and so forth. In the case of Selectors, we (as a WG) have mainly focused on pointing out what the behavior should be and not on defining how implementations should effect that behavior. A higher-level normalization might be the best way to achieve this in a given implementation---or, ultimately, in any implementation. However, I would tend to say that such a decision impacts many more technologies and specs and it should be arrived at carefully.
> 
> 
> > Also, it was said that Gecko interns strings -- could it
> > normalize them right before interning, so that subsequent
> comparisons
> > are still just pointer comparisons?
> 
> That would not help with strings that are not atomized. E.g.
> 
>    function search(s) {
>      ...
>      return node.textContent == s
>    }
> 
> or (a feature that might be added to Selectors Level 4):
> 
>    :contains("...") { ... }
> 

The performance factors important to atomized strings probably don't apply to these operations though. It is probably acceptable to handle a random text match such as :contains with lower performance. 

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Friday, 6 February 2009 21:57:52 UTC