- From: Robert J Burns <rob@robburns.com>
- Date: Thu, 5 Feb 2009 20:37:22 -0600
- To: Andrew Cunningham <andrewc@vicnet.net.au>
- Cc: Jonathan Kew <jonathan@jfkew.plus.com>, Benjamin Blanco <benjo316@gmail.com>, Anne van Kesteren <annevk@opera.com>, Aryeh Gregor <Simetrical+w3c@gmail.com>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Hi Andrew, On Feb 5, 2009, at 6:27 PM, Andrew Cunningham wrote: > > > Robert J Burns wrote: >> >> >> I hadn't thought of that, but you're probably right. However this >> is either 1) a variation on the same bug I described earlier or 2) >> a font that is old and not yet updated to support U+3008 and U >> +3009. Again, an updated font, if it supports a particular >> character, should support all of canonically equivalent characters >> for that character since it does not require producing another >> glyph, but simply adding a mapping for an already designed glyph to >> another character (or character sequence). >> > why would U+3008 and U+3009 share the same glyph shape as the > canonically equivalent characters? Not sure this is necessary, nor > even desirable in many contexts. It's important to understand what Unicode means by canonically equivalent, but before I go into that let me first say that even if it was desirable to develop different glyphs for canonically equivalent characters (which it definitely is not), it still takes very little effort for a font maker to include mappings to all canonically equivalent characters whenever that font maker opts to not provide another distinct glyph. In other words font makers should neglect to include glyph mappings to all of the canonically equivalent characters for which they have designed and provided a glyph. > But harmonising typographic design within multi script fonts can be > problematic at the least. One of the reasons its better to use > appropriate fonts for the language and contents of a document. > > The shape of each glyph is a design consideration by the font > developer base don the context of its usage. > > I'd assume the designer would develop the glyph and its metric to > suit its usage, and harmonise with the script it is most likely to > be used with. I understand, but it's important to clearly understand what Unicode means by canonically equivalent characters. These are equivalent in the sense that they have the same meaning in the text. As I've said before the state of normalization and canonical equivalence is a mess, but there's no way to continue contributing to the mess. Let's try to fix it instead of making it worse. When a font designer uses separate glyphs for canonically equivalent characters it undermines the Unicode Standard. Rather than using a font to undermine the standard (or a parser or another processor) wouldn't it be better to take for the font maker to take complaints directly to the Unicode Consortium? Explain to the Unicode Consortium why these characters shouldn't be treated as equivalents. Simply using canonically equivalent characters to add more glyphs to a font is not a good practice to follow at all. > The characters may be canonically equivalent, but this does not mean > that they need to be visually identical or share a glyph. > > For instance: a font may use the same glyph for <U+0065 U+0302 U > +0301> and <U+1ebf>. Alternatively it may use different glyphs for > each. It hinges on the intention of the font's designer and their > intended audience and use of the font. For the non-singleton canonical decompositions (like this one) the only intent of the designer I can imagine is undermining the Unicode standard here. If their was some other reason to expose separate glyphs there's a Private Use Area that would be appropriate for that. That character is equivalent to that character sequence in that it is not supposed to imply (and no user should infer) any separate meaning from those two different character sequences. Using separate glyphs undermines that equivalence. Presumably a font could also encode different glyphs for the following character sequences (as in 1 or more character): 1) Ệ (U+1EC6) 2) Ê (U+00CA) (U+0323) 3) Ẹ (U+1EB8) (U+0302) 4) E (U+0045) (U+0323) (U+0302) 5) E (U+0045) (U+0302) (U+0323) But this would be an abuse of the Unicode Standard. These are all canonically equivalent characters sequences and should not be used by font designers to express their intent or target different audiences. The font designer should either use the private use are for exposing the glyphs if plain text is important or use a rich text protocol to assign different glyphs to different instances of these canonically equivalent character sequences, but don't map different representations of canonically equivalent character sequences to different glyphs. Likewise authors should not be relying on such character reordering and canonically equivalent substitution to express different meaning or different visual effects. This too is an abuse of the Unicode Standard. However, I feel the authors have a better excuse if they don't understand the Unicode Standard than a font maker. If the font doesn't display different glyphs, the author won't be lured into such shenanigans. > But then a well designed font (intended for generic use of Latin > script languages) will have more than one glyph available for the > character <U+1ebf>. That's fine, as long as we understand that such alternate glyphs are only accessible through non-plaintext protocols and font designers should not abuse the Unicode Standard to make them accessible in plain text. Take care, Rob
Received on Friday, 6 February 2009 02:38:04 UTC