Re: Unicode Normalization

On 5 Feb 2009, at 14:02, Benjamin Blanco wrote:

> On Thu, Feb 5, 2009 at 1:06 AM, Robert J Burns <rob@robburns.com>  
> wrote:
> Hi Benjamin,
>
> On Feb 4, 2009, at 9:17 PM, Benjamin wrote:
>> Also, I can see a difference between the characters; The two  
>> brackets at the top and the one on the bottom left are duller,  
>> while the other three are sharper. This difference is apparent in  
>> both the browser and the text editor(Not sure if it matters, though).
>
> I would say that is a bug in your font. Fonts, by using separate  
> glyphs for canonically equivalent characters, contribute to the  
> confusion authors face when creating content. The glyph distinctions  
> lead authors to treat the characters semantically distinct (which  
> shouldn't happen). Fonts play an important role in this (on par with  
> input systems) since the fonts control the glyphs used. For example  
> if a font uses the same glyphs for "½" as the font maker uses for  
> the compatibility equivalent sequence "1⁄2", this helps with  
> Unicode authoring. It is remarkable how few font makers take minimal  
> amount of time necessary to do this.

Fully comprehending and addressing issues of Unicode-to-glyph mapping,  
canonical-equivalent sequences and alternatives, etc, requires far  
from a "minimal amount of time" for font makers.

Also, most fonts are targeted at a particular market (such as Western  
Europe), and make no claim to support languages or writing systems  
outside this area. Even in the non-Latin world, fonts are developed  
for limited markets; for example, an Arabic-script font might support  
Arabic, Persian, and Urdu, but not necessarily the Arabic-script  
orthographies of West African languages. However, as browser  
developers we are (or should be) aiming to serve a worldwide market,  
and this does come with additional costs.

> This is a similar problem to font/glyph issues outlined earlier by  
> Andrew Cunningham with various African and Eastern languages.
>
> I've tried several different fonts, and they all render the glyphs  
> differently, despite canonical equivalence.

This is somewhat tangential to the real issue, but FWIW.... I suspect  
that in most (or perhaps all) cases, what's really happening is that  
the font you're using does not support the characters U+3008 and U 
+3009, and your software is performing a font fallback and rendering  
these from its default CJK font instead. So it's not that font  
developers are providing different glyphs for canonically-equivalent  
characters, but rather, they are not necessarily supporting the  
equivalent characters at all.

JK

Received on Thursday, 5 February 2009 22:19:40 UTC