RE: Unicode normalization in CSS from Phillips, Addison on 2011-06-21 (public-i18n-core@w3.org from April to June 2011)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 21 Jun 2011 11:44:58 -0700
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Anne van Kesteren <annevk@opera.com>
CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476A93BF5D4A@EX-SEA31-D.ant.amazon.com>

Martin wrote:
> 
> 
> >> Sequences of code points and their comparison are the issue and it
> >> would not be a revolution to do so in a normalized manner.
> >
> > I do not think it is particularly problematic if we just leave it as is.
> > In the particular case of CSS namespace prefixes it would even require
> > the CSS resource to be in several different Unicode normalization forms.
> > That is just bad practice.
> 
> I agree with Anne that it's quite low priority for CSS identifiers, in particular CSS
> namespace.

I agree that it doesn't really matter for Namespace, since those only occur within a single stylesheet document. Presumably users can use the same normalization form (or lack thereof) within a single document.

> 
> 
> >> Normalization of stylesheets and other documents may not make sense,
> 
> Normalization of stylesheets, or more correctly normalization of
> stylesheets and the Web pages where they get used (to take care of class
> and id values that appear in selectors), makes quite a bit of sense.

Generally speaking, I agree. However, the likelihood of and reasonableness of normalizing documents during the parsing stage is under question. Normalizing documents on-load may interfere with the user's intentions. Additionally, user-agents haven't imposed normalization for reasons (realistic or imagined) of performance. The I18N WG is currently in favor of recommending that that selectors (element, class, and id names) be compared in a normalized manner, which has its own implications.

> 
> >> but that doesn't address the problem of selection. See my recent
> >> emails and the I18N WG's work on same.
> >
> > Can you please give a reference? What exactly do you mean with the
> > problem of selection?

The "problem of selection" is as outlined above: if a selector or document(s) use different (but equivalent) character sequences for an element, class, or id name, should they should still match? Or should different code point sequences be considered distinct? This has implications for the DOM, for ECMAScript implementations, as well as for CSS.

Last week there was a teleconference involving CSS and I18N:

    http://www.w3.org/2011/06/17-cssns-minutes.html 

The outcome of that was that Richard Ishida and I will be approaching the TAG, seeking a finding as to what, if any, normalization recommendations should be applied by W3C specs. There is an existing TAG finding, but it antedates CharMod-Norm and, IMHO, CharMod-Norm as it stands is indefensible. 

Other references would include recent I18N WG teleconferences, which have discussed this at length.

Recent notes on public lists about this include:

  http://lists.w3.org/Archives/Public/public-i18n-core/2011AprJun/0093.html 
  http://lists.w3.org/Archives/Public/public-i18n-core/2011AprJun/0123.html 

The draft recommendations wiki that I18N is creating lives at (comments invited):

  http://www.w3.org/International/wiki/CharmodNormSummary

Regards,

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 21 June 2011 18:45:33 UTC