Re: ISSUE-299: Cluster matching 1b from John Hudson on 2013-09-12 (www-style@w3.org from September 2013)

From: John Hudson <tiro@tiro.com>
Date: Thu, 12 Sep 2013 11:56:19 -0700
CC: W3C Style <www-style@w3.org>, www International <www-international@w3.org>
Message-ID: <52320E53.5060507@tiro.com>

Richard Ishida wrote:

> Or is the meaning that if the font has a glyph for the precomposed
> character that is canonically equivalent to the sequence of characters,
> then that glyph should be used (without changing the sequence of
> characters itself). That would seem to make more sense.

Yes, that does make more sense, and should probably be spelled out.

It is also what at least some layout engines do regularly. MS Uniscribe 
will perform a cmap check for a precomposed glyph representing a 
canonical composition of a cluster, and use that glyph in preference to 
the decomposed glyph sequence. The reasoning for this is that a) many 
fonts may support the precomposed character but not have GPOS mark 
positioning (especially true for European diacritic characters in a huge 
number of fonts), and b) character level substitution is faster than 
glyph level GSUB composition. I presume the same operations would apply 
directly in the CSS cluster matching model.

[Because of such layout engine operations, on the font side the OpenType 
Layout tables are generally built around an assumption of buffered 
NFC-like input from the cmap, regardless of the original text string. 
This means, of course, that in some fonts <ccmp> will be used to 
decompose the initial glyph strings that the layout engine has composed 
at the cmap level from originally decomposed character strings -- 
thereby demolishing the presumed time saving of the cmap composition 
operation. That's a choice the font developer makes based on whether he 
or she wants to work, during glyph processing, with precomposed 
diacritic glyphs, decomposed bases pus marks, or -- most awkwardly -- a 
mix of the two.]

JH

Received on Thursday, 12 September 2013 18:57:00 UTC