- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 30 Jan 2009 13:53:53 -0000
- To: "'Martin Duerst'" <duerst@it.aoyama.ac.jp>, "'Phillips, Addison'" <addison@amazon.com>, <public-i18n-core@w3.org>
- Cc: "'fantasai'" <fantasai.lists@inkedblade.net>, "'Lachlan Hunt'" <lachlan.hunt@lachy.id.au>
Hi Martin, > -----Original Message----- > From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp] > Sent: 30 January 2009 09:11 ... > At 03:32 09/01/30, Richard Ishida wrote: > > > >Following on from our discussion at yesterday's telecon, I did some > >research into whether major browsers actually do normalise selector and > >class names for matching. The answer is that they don't. > > I could have told you. The visibility of this issue is extremely low. > It only applies to languages such as Vietnamese, where both > precomposed and (half-)decomposed forms are widely used, > and only if element or attribute names use these characters, > which by itself is very rare > (there is also the attribute value case, but that's still > not very widely supported in browsers as far as I understand). To be honest I wasn't particularly concerned with the case where element and attribute names are involved. The case for class names and ids is much more pressing (and that's what the tests revolve around), and in my mind those are very prone to be in-language and not rare at all as the Web rolls out internationally. And the 83 million inhabitants of Vietnam are not the only people who face this issue. There are many languages that use combining characters, including the Latin script based languages of Africa and aboriginal North America, most scripts of Asia, etc., and one can't always guarantee that the input methods used for those languages will always create text in one given form vis a vis normalization. ... > The 'different people working on the CSS and the markup' may indeed > be a possible scenario, and things could go wrong in particular if > e.g. the CSS designers work on a Mac and the text is prepared on > Windows, but then developers in Vietnam should be aware of this > issue, they will bump into it much earlier, e.g. when doing text > searching in editors,... My guess is that information on this > is also available rather easily in Vietnamese, for English, > see e.g. http://vietunicode.sourceforge.net/main.html. It's one thing to be aware of the problem, but another to be able to deal with it. First, if someone else is writing the CSS and you are writing the HTML, you have to know something about normalization, then you have to work out what approach the CSS guy used for class names (which could even vary from name to name depending on the input method), but then you have to match his method. That means that if you're using a Mac and he was using Windows you'll need to convert your names to partially normalized form. That requires an additional level of fiddling, assuming that you know how to find a way to actually achieve it (you might need a different input method, or need to change the settings of your editor, and if the CSS text isn't consistent in the way combining characters are used or ordered this could be much more problematic). Highly technical people like you may be able to figure all this out, but CSS and HTML aren't designed to be used just by highly technical people. Secondly, you shouldn't have to examine bytes to write CSS, this is just a nuisance when you just want to type the same word as the other guy did. That's what normalization is for - recognising that canonically equivalent text is actually the same. Normalizing the data before lookup would remove all those issues. Cheers, RI
Received on Friday, 30 January 2009 13:53:57 UTC