RE: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

( The following is a personal response. )

Any place where an API internally compares language-like strings for "equality" is a candidate for normalization.

Since stylesheets and content (HTML, XML, etc.) cannot be guaranteed to be Unicode normalized, and, indeed, since they are not generally normalized automatically, if you're going to compare strings, it would be best to guarantee something about the normalization state of the results. It happens that the most common character encodings, systems, and languages on the Web result in content that happens to be in Unicode Normalization Form C (NFC). But it is also true that there are languages where this is not the case.

It is useful, perhaps, to glance at http://www.w3.org/TR/CharMod-Norm to remind yourself of why normalization is a problem in cases such as selectors.

I realize that current implementations don't normalize. Richard tested selectors earlier today and found that this is the case. However, we have real-world examples of how this could be a problem for users. See, for example, Vietnamese, where Windows users often have denormalized input but content may be in a normalized form.

Whether and/or how it is appropriate to perform normalization in CSS and how to make it consistent is a very real issue that we should resolve. *Not* addressing it means that any matching operation relies on the compared Unicode character streams using exactly the same code point sequence. This may turn out to be "How The World Works And Too Bad For You If You Don't Abide". But that decision should be well-documented and not made based on the de-facto situation. It should be made consciously and consistently.

I will be the first to point out that "I18N WG" has not done its job by finishing CharMod-Norm (years and years ago). We are once again having to tackle it. It is our current position as a WG that a) early uniform normalization of all content and formats is impossible to mandate at this late date; b) Normalization Form C (NFC) is not always appropriate; but c) operations that are normalization sensitive in Specs really *must* address whether and how they deal with normalization and the issues therein.

So I think I disagree with your observation, at least about selectors. I'd have to go back and look at other aspects of CSS to see if I think there are other cases like them.

Regards,

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> request@w3.org] On Behalf Of L. David Baron
> Sent: Thursday, January 29, 2009 1:55 PM
> To: public-i18n-core@w3.org; www-style@w3.org
> Subject: Re: [CSS21][css3-namespace][css3-page][css3-
> selectors][css3-content] Unicode Normalization
> 
> 
> On Thursday 2009-01-29 13:27 -0800, fantasai wrote:
> > Thanks for the tests and the report, Richard. Going from that, I
> think it
> > makes sense to require /not/ Unicode-normalizing CSS. It may be a
> bit
> > confusing indeed for people working in Vietnamese and other such
> languages,
> > but on the other hand behavior across browsers is interoperable
> right now.
> > If one browser started normalizing, then someone testing in that
> browser
> > would not notice that the page is broken in other UAs.
> 
> In what parts of CSS might you want unicode normalization to be
> done?  The only case I can think of is selector matching that
> compares attribute values (or, in the future, text content).  And
> even then it seems like it might be helpful in some cases and
> harmful in others.
> 
> (I tend to think we probably don't want it for selectors, both
> since
> we currently have interoperability, and since selectors are
> particularly performance-sensitive.)
> 
> -David
> 
> --
> L. David Baron                                 http://dbaron.org/

> Mozilla Corporation                       http://www.mozilla.com/

Received on Thursday, 29 January 2009 23:45:04 UTC