- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 2 Feb 2009 14:18:44 +0200
- To: Jonathan Kew <jonathan@jfkew.plus.com>
- Cc: public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
On Jan 30, 2009, at 17:02, Jonathan Kew wrote: > On 30 Jan 2009, at 14:24, Anne van Kesteren wrote: >> I may be biased, > > We all are, in various ways! > > I'd guess that almost all of us here are pretty comfortable using > English (otherwise how would we be having this discussion?), and the > expectation that programming and markup languages are English-based > is deeply ingrained. Some of us, perhaps, like to include comments > in another language, or even use variable names in another (normally > Western European) language, but that's as far as it goes. [...] > It's supposed to be the World Wide Web, not the Western World's > Web. :) The written forms of non-English "Western" languages are not invariant under Unicode normalization. If one is of the opinion that it doesn't make sense to perform Unicode normalization of identifiers on the Web consumer side, it is not an issue of Western bias. In my opinion, identifier comparison in Web languages should be made on code point for code point basis except where backward compatibility requires additionally treating the Basic Latin characters a–z as equivalent of A–Z in which case those ranges should be considered equivalent and everything else be compared on a code point for code point basis. This approach is good for performance and backward compatibility. I think the right place to do normalization for Web formats is in the text editor used to write the code, and the normalization form should be NFC. > An alternative would be to significantly restrict the set of > characters that are legal in names/identifiers. However, this tends > to also restrict the set of languages that can be used for such > names, which I don't think is a good thing. In the context of text/html and CSS, that doesn't really solve the processing issue, since it would still be necessary to define behavior for non-conforming content. If one is only concerned with addressing the issue for conforming content or interested in making problems detectable by authors, I think it makes to stipulate as an authoring requirement that both the unparsed source text and the parsed identifiers be in NFC and make validators check this (but not make non-validator consumers do anything about it). Validator.nu already does this for HTML5, so if someone writes a class name with a broken text editor (i.e. one that doesn't normalize keyboard input to NFC), the validator can be used to detect the problem. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 2 February 2009 12:19:28 UTC