- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 5 Feb 2009 12:39:08 +0200
- To: Andrew Cunningham <andrewc@vicnet.net.au>
- Cc: Jonathan Kew <jonathan@jfkew.plus.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
On Feb 5, 2009, at 01:02, Andrew Cunningham wrote: >>> if a browser can't render combining diacritics, then it will not >>> be able to render NFC data when the NFC data uses combining >>> diacritics. >> >> Right. However, that does not make those browsers useless, because >> there are lot of things that can be communicated with precomposed >> characters. > > Depends on the language. That's my point. >>> So for a "legacy" browser when a document contains combining >>> diacritics it doesn't matter if the text is NFC or NFD, it will >>> not correctly render it. >>> >>> For legacy browsers, Unicode will always be a barrier regardless >>> of normalisation form. >> >> Only for cases where the mapping from characters to graphemes is >> not one-to-one. In a lot of cases that have utility, the mapping is >> one-to-one. > > Most writing scripts aren't simple one-to-one mappings My point is that saying that Unicode is *always* a barrier for software that does one-to-one rendering is hyperbole. There's a barrier for scripts where even the NFC form involves combining characters. And there's a lot of utility in cases where the barrier doesn't apply. > yep, personally i think default wholesale normalisation would be > interesting, defaulting to NFC. But I'd want a mechanism in CSS and > in the browsers for the web developer to specify alternative > behaviour when required. I think normalisation is required. But I'd > also liek to have the flexibility of using the normalisation form > appropriate to the web development project at hand. Isn't the simple way for getting author-controlled normalization to let authors normalize at their end? It's something that an author can deploy without waiting for a browser upgrade cycle. >>>>> esp if you also want to comply with certain AAA checkpoints in >>>>> WCAG 2.0. >>>> >>>> Hold on. What WCAG 2.0 checkpoints require content *not* to be in >>>> NFC? If that's the case, there are pretty serious defect >>>> *somewhere*. >>>> >>> As far as I know WCAG 2.0 is normalisation form agnostic, it >>> doesn't require any particular normalisation form. But there is a >>> stuff about guidance for pronunciation, and for tonal African >>> languages this means dealing with tone marking (where in day to >>> day usage it isn't included) - partly or language learners, >>> students and in some case to aid in disambiguating ideas or words. >>> It could be handled at the server end or at the client end. To >>> handle at the client end, easier to use NFD data, and for >>> langauges like Igbo, etc run simple regex to toggle between tonal >>> versions and standrad versions. >> >> I see. This doesn't mean that serving content in NFD is *required* >> only that one implementation strategy for a case that is unusual on >> a global scale becomes *easier* if the DOM data is in NFD. >> > yes, nor is it an argument against normalisation, rather a > recommendation for some control of normalisation forms by the web > developer. To me, it seems like the ability to normalize in one's content creation workflow before the data travels over HTTP to a client and having JavaScript functions for normalizing strings would give the appropriate level of control to Web developers who want it. > For some of the core Vista fonts I get better typographic display > using combining diacritics. Seems like a font bug if the pre-composed glyphs are worse. >> I see. Still, I think it's more reasonable that teams whose >> multipart graphemes don't have an obvious order for the subparts of >> the grapheme bear the cost of dealing with this complex feature of >> their writing system and for the sake of performance every browser, >> XML parser, etc. around the world on all kinds of devices doesn't >> burn cycles (time/electricity/CO₂) just *in case* there happens to >> be a string compare where combining characters might have been >> inconsistently ordered. >> > That asusmes that the development team are even aware of the issue. > I wonder how many non-Vietnamese web developers know or understand > the impact different input systems will ahve on a Vietnamese project > they may be working on. Do non-Vietnamese Web developers working on Vietnamese content use fully diacritical Vietnamese words as computer language identifiers such as HTML class names and ids? Is the case of non-Vietnamese developers working on a Vietnamese project without proper understanding of Vietnamese issues globally important enough a case that all software everywhere should burn more cycles when interning strings instead of the authoring software used by such teams burning more cycles? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 5 February 2009 10:39:50 UTC