RE: [selectors-api] Selectors API I18N Review... from Martin Duerst on 2009-01-31 (public-i18n-core@w3.org from January to March 2009)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Sat, 31 Jan 2009 16:11:12 +0900
To: "Richard Ishida" <ishida@w3.org>, "'Phillips, Addison'" <addison@amazon.com>, <public-i18n-core@w3.org>
Cc: "'fantasai'" <fantasai.lists@inkedblade.net>, "'Lachlan Hunt'" <lachlan.hunt@lachy.id.au>, <www-style@w3.org>
Message-Id: <6.0.0.20.2.20090131160314.06c74348@localhost>

At 22:53 09/01/30, Richard Ishida wrote:

>To be honest I wasn't particularly concerned with the case where element and
>attribute names are involved.  The case for class names and ids is much more
>pressing (and that's what the tests revolve around), and in my mind those
>are very prone to be in-language and not rare at all as the Web rolls out
>internationally.

Valid point. I wouldn't use the word 'pressing' here, but it's
definitely the case that class names and ids need more consideration.

>And the 83 million inhabitants of Vietnam are not the only
>people who face this issue.  There are many languages that use combining
>characters, including the Latin script based languages of Africa and
>aboriginal North America, most scripts of Asia, etc., and one can't always
>guarantee that the input methods used for those languages will always create
>text in one given form vis a vis normalization.

Using combinging characters isn't what's important. If no precombined
variant is available, and there is only one combining character, there
are no problems,... It would be very good to see some actual examples,
rather than roundaboutly mentioning whole continents.

>It's one thing to be aware of the problem, but another to be able to deal
>with it.  First, if someone else is writing the CSS and you are writing the
>HTML, you have to know something about normalization, then you have to work
>out what approach the CSS guy used for class names (which could even vary
>from name to name depending on the input method), but then you have to match
>his method.  
>
>That means that if you're using a Mac and he was using Windows you'll need
>to convert your names to partially normalized form.  That requires an
>additional level of fiddling, assuming that you know how to find a way to
>actually achieve it (you might need a different input method, or need to
>change the settings of your editor, and if the CSS text isn't consistent in
>the way combining characters are used or ordered this could be much more
>problematic). 

The solution is much simpler than that. It's copy-and-paste. Open the
relevant CSS file in your text editor, and copy the class name over to
your HTML.

>Highly technical people like you may be able to figure all this out, but CSS
>and HTML aren't designed to be used just by highly technical people.
>Secondly, you shouldn't have to examine bytes to write CSS, this is just a
>nuisance when you just want to type the same word as the other guy did.
>That's what normalization is for - recognising that canonically equivalent
>text is actually the same.  Normalizing the data before lookup would remove
>all those issues.

I agree that nobody should have to examine bytes to write CSS or
the HTML that goes with it. However, the right thing is to provide
people with keyboard implementations that do the right thing from
the start. That's the whole point of why NFC was created; if it were
not for that, we never would have needed NFC, because each browser/...
can normalize internally whichever way they want.

Another easy alternative, which may already happen in some cases,
although I personally wouldn't like it, is that when such problems
occur, people will just go back to writing the identifier in question
without accents.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Sunday, 1 February 2009 08:33:08 UTC