- From: Jonathan Kew <jonathan@jfkew.plus.com>
- Date: Fri, 30 Jan 2009 15:02:02 +0000
- To: "Anne van Kesteren" <annevk@opera.com>
- Cc: "Richard Ishida" <ishida@w3.org>, "'L. David Baron'" <dbaron@dbaron.org>, public-i18n-core@w3.org, www-style@w3.org
On 30 Jan 2009, at 14:24, Anne van Kesteren wrote: > > On Fri, 30 Jan 2009 15:12:35 +0100, Richard Ishida <ishida@w3.org> > wrote: >> [This thread grew out of one that didn't include www.style, and has >> since >> forked a little. I am therefore pointing to a couple of emails (on >> the i18n public list) that didn't reach www.style but that I think >> are relevant. I suggest that we henceforth keep both public-i18n >> and www-style copied on all emails related to this topic. ] >> >> See Martin's email at >> http://lists.w3.org/Archives/Public/public-i18n-core/2009JanMar/0039.html >> >> See my response at >> http://lists.w3.org/Archives/Public/public-i18n-core/2009JanMar/0041.html > > So > > 1) Do browsers normalize currently? > > 2) Assuming they do not, who have complained? > > I may be biased, We all are, in various ways! I'd guess that almost all of us here are pretty comfortable using English (otherwise how would we be having this discussion?), and the expectation that programming and markup languages are English-based is deeply ingrained. Some of us, perhaps, like to include comments in another language, or even use variable names in another (normally Western European) language, but that's as far as it goes. With standards such as HTML and CSS, however, we should be building a Web that is equally welcoming to people of all cultures and languages, including those who would struggle to come up with variable names in any Latin-script language, even if they have learned to recognize basic tags like <html>. Where there are technical hurdles -- such as multiple binary representations of the same text elements -- we should use the tools available to us (in this case, canonical equivalence as defined by Unicode) to minimize the barriers these will present to the "have-nots" of the digital world, not just ignore them because they don't significantly impact the "haves". It's supposed to be the World Wide Web, not the Western World's Web. :) > but I have the feeling that performing Unicode Normalization on code > snippets is overkill. An alternative would be to significantly restrict the set of characters that are legal in names/identifiers. However, this tends to also restrict the set of languages that can be used for such names, which I don't think is a good thing. It seems to me that this issue is similar to that of Internationalized Domain Names, where it certainly isn't considered acceptable for there to be canonically-equivalent names that are treated as distinct. > It could potentially also make certain class names and IDs identical > that are now different/unique. Seems like a bad idea to me. If anyone has created content that relies on a distinction between names/IDs that would be erased by normalization -- i.e., names that are canonically equivalent, as defined by Unicode -- then that data and/or the processes using it are not compliant with the Unicode standard, and they're liable to break at some undefined point in the future when they attempt to interoperate with products or data that *are* Unicode-compliant. That's really a bad idea. Better to face the issue now, define appropriate and robust standards, and encourage anyone who has currently got such ill-designed data to fix it. (I doubt it actually exists, though.) JK
Received on Friday, 30 January 2009 15:03:09 UTC