- From: Robert J Burns <rob@robburns.com>
- Date: Thu, 5 Feb 2009 17:51:59 -0600
- To: Jonathan Kew <jonathan@jfkew.plus.com>
- Cc: Benjamin Blanco <benjo316@gmail.com>, Anne van Kesteren <annevk@opera.com>, Aryeh Gregor <Simetrical+w3c@gmail.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Hi JK, On Feb 5, 2009, at 4:18 PM, Jonathan Kew wrote: > On 5 Feb 2009, at 14:02, Benjamin Blanco wrote: > >> On Thu, Feb 5, 2009 at 1:06 AM, Robert J Burns <rob@robburns.com> >> wrote: >> Hi Benjamin, >> >> On Feb 4, 2009, at 9:17 PM, Benjamin wrote: >>> Also, I can see a difference between the characters; The two >>> brackets at the top and the one on the bottom left are duller, >>> while the other three are sharper. This difference is apparent in >>> both the browser and the text editor(Not sure if it matters, >>> though). >> >> I would say that is a bug in your font. Fonts, by using separate >> glyphs for canonically equivalent characters, contribute to the >> confusion authors face when creating content. The glyph >> distinctions lead authors to treat the characters semantically >> distinct (which shouldn't happen). Fonts play an important role in >> this (on par with input systems) since the fonts control the glyphs >> used. For example if a font uses the same glyphs for "½" as the >> font maker uses for the compatibility equivalent sequence "1⁄2", >> this helps with Unicode authoring. It is remarkable how few font >> makers take minimal amount of time necessary to do this. > > Fully comprehending and addressing issues of Unicode-to-glyph > mapping, canonical-equivalent sequences and alternatives, etc, > requires far from a "minimal amount of time" for font makers. I'm sorry. I didn't mean to imply that it was a small amount of work to understand all of this. Clearly it is not. What I meant by that was that once someone has become a font maker (and therefore necessarily achieved a certain level of understanding about Unicode and Unicode imaging), then it is a minimal amount of work to check that canonically equivalent (and in some cases compatibility equivalent) share the same rendering (or at least the same rendering up to a relevant transformation for the compatibility equivalent characters). > Also, most fonts are targeted at a particular market (such as > Western Europe), and make no claim to support languages or writing > systems outside this area. Even in the non-Latin world, fonts are > developed for limited markets; for example, an Arabic-script font > might support Arabic, Persian, and Urdu, but not necessarily the > Arabic-script orthographies of West African languages. However, as > browser developers we are (or should be) aiming to serve a worldwide > market, and this does come with additional costs. Agreed. However even though fonts necessarily target subsets of the Unicode repertoire, they should always map the glyphs to canonical equivalents: simply because it is such a trivial thing to do once everything else about the font has been completed. It doesn't require any additional glyphs, but simply a few bytes added to a glyph mapping table. >> This is a similar problem to font/glyph issues outlined earlier by >> Andrew Cunningham with various African and Eastern languages. >> >> I've tried several different fonts, and they all render the glyphs >> differently, despite canonical equivalence. > > This is somewhat tangential to the real issue, but FWIW.... I > suspect that in most (or perhaps all) cases, what's really happening > is that the font you're using does not support the characters U+3008 > and U+3009, and your software is performing a font fallback and > rendering these from its default CJK font instead. So it's not that > font developers are providing different glyphs for canonically- > equivalent characters, but rather, they are not necessarily > supporting the equivalent characters at all. I hadn't thought of that, but you're probably right. However this is either 1) a variation on the same bug I described earlier or 2) a font that is old and not yet updated to support U+3008 and U+3009. Again, an updated font, if it supports a particular character, should support all of canonically equivalent characters for that character since it does not require producing another glyph, but simply adding a mapping for an already designed glyph to another character (or character sequence). But I think you're right that a likely explanation is that what Benjamin witnessed was caused by an older font rendering the NFC characters and caused a font fallback to a new font that simply had a different glyph for the two canonically equivalent characters. Normalization in the text processor would have avoided this issue as well. Take care, Rob
Received on Thursday, 5 February 2009 23:52:39 UTC