RE: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization from Richard Ishida on 2009-01-30 (public-i18n-core@w3.org from January to March 2009)

From: Richard Ishida <ishida@w3.org>
Date: Fri, 30 Jan 2009 15:19:37 -0000
To: "'fantasai'" <fantasai.lists@inkedblade.net>, <public-i18n-core@w3.org>, <www-style@w3.org>
Cc: "'Lachlan Hunt'" <lachlan.hunt@lachy.id.au>
Message-ID: <004b01c982ee$2a842360$7f8c6a20$@org>

I think that the fact that some user agents may normalise and others not is likely to produce problems in the following cases:

1. in situations where someone has specifically relied on the fact that although the two names are canonically equivalent in Unicode he/she has specifically designed the CSS so that different combinations of base and combining characters produce different effects.

2. someone develops their code and tests only in user agents that normalise away incompatible 'spellings' that other user agents don't.

I expect case 1 is vanishingly rare. Remember that these strings are canonically equivalent in Unicode - they say exactly the same thing, it's just as if the accent is changed. In fact, we may be doing people a favour here in terms of disabling security issues.  On the other hand, we stand to clarify and alleviate problems for a lot of people who have done this by mistake.

Case 2 can be alleviated by wider testing and as more user agents implement normalisation. I think that wide testing is something that is normally needed anyway, and applies to other features too.  Adding a requirement to a spec for normalization will help to get more user agents implementing normalization.  Not doing so will just postpone the issue. (I would note that there was a reference to normalization in the selectors spec previously that was taken out due to an editorial question.) 

RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/

> -----Original Message-----
> From: fantasai [mailto:fantasai.lists@inkedblade.net]
> Sent: 29 January 2009 21:28
> To: Richard Ishida
> Cc: 'Phillips, Addison'; public-i18n-core@w3.org; 'Lachlan Hunt'; www-
> style@w3.org
> Subject: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content]
> Unicode Normalization
> 
> Richard Ishida wrote:
> > Following on from our discussion at yesterday's telecon, I did some
> research into
> > whether major browsers actually do normalise selector and class names
> for matching.
> > The answer is that they don't.
> >
> > Tests: http://www.w3.org/International/tests/css/tests-selectors/
> >
> > Results: http://www.w3.org/International/tests/css/tests-selectors/results-
> normalization
> >
> > (Thanks to Andrew for suggesting the use of Vietnamese.)
> >
> > I suggest we follow up on Elika's helpful note and request that the CSS WG
> > re-examine this for CSS 2.1 and the CSS3 modules.  I think it is quite an
> > important lapse, and I'm not sure how we missed it for so long.  Certainly
> > this can cause major headaches for people working in Vietnamese and the
> > many other languages that use combining characters, in that the cause of
> > the failure to match names is not at all obvious, and fixing it may not be
> > simple, especially if different people are working on the CSS and the
> markup.
> 
> Thanks for the tests and the report, Richard. Going from that, I think it
> makes sense to require /not/ Unicode-normalizing CSS. It may be a bit
> confusing indeed for people working in Vietnamese and other such
> languages,
> but on the other hand behavior across browsers is interoperable right now.
> If one browser started normalizing, then someone testing in that browser
> would not notice that the page is broken in other UAs.
> 
> ~fantasai

Received on Friday, 30 January 2009 15:19:39 UTC