- From: John Hudson <tiro@tiro.com>
- Date: Fri, 10 Oct 2014 12:04:15 -0700
- To: John Cowan <cowan@mercury.ccil.org>
- CC: Richard Ishida <ishida@w3.org>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>, indic <public-i18n-indic@w3.org>
On 10/10/14 11:24 AM, John Cowan wrote: >> As more font makers realise the >> relative efficiency of handling Arabic without ligatures, instead >> utilising contextually triggered variant letter glyphs, > What is the difference between these two? Contextual letter (or archegrapheme*) glyphs allow for a much smaller font glyph set, correspondingly fewer and simpler mark positioning GPOS lookups, and more thorough coverage of possible joining sequence behaviour. As an example, an older Monotype implementation of Urdu using a ligature font required more than 20,000 glyphs, and still failed in some situations (the classic case was transliteration of foreign loan words; Urdu newspaper fonts needed to be updated with a whole-word ligature every time a new Soviet leader was announced). By contrast, Urdu can be handled using contextual archegraphemes using only a few hundred glyphs. The use of ligatures for Arabic type is an artefact of metal typesetting, and the difficulty of handling multiple vertical offsets of joining letters during composing. It was easier to cast sequences of joining letters as ligature sorts, even though this frequently led to problems in text composition. For example, suppose you have a letter sequence ABC, and you have ligatures for AB and BC, but not for the three letters; which ligature do you use? and how ugly is the result? One sees this sort of problem a lot in Arabic metal typesetting, and it was inherited into phototype and digital fonts (and even into Unicode via the presentation form blocks). At its root, I would say, is a fundamental mis-analysis of Arabic writing, which looks at the script in terms of topographical variants (isolated, initial, medial and final) which sometimes form 'special' ligature shapes with each other. I think you get a more accurate picture -- and hence a more appropriate set of technical solutions -- if you look at the script in terms of style-specific graphotactic options arranged around a basic set of character-level joining behaviours. [Unicode largely gets the last bit right, by standardising joining behaviour properties rather than topographical variants, even though the latter are what tend to get illustrated.] The mis-analysis looks at a discrete sequence of Arabic letters and says 'This first one takes initial form, and this second one takes medial form, and this third one takes final form', and then says 'These particular medial and final letters take a special ligature shape that differs from their default connection'. The better analysis looks at the same sequence and says 'Each of these letters has a particular appropriate form based on its neighbour(s).' JH * Particularly in script styles that involve significant amounts of vertical offset connections, e.g. nastaliq, it may make sense to decompose letters with disambiguating dots into archegrapheme base glyphs and dynamically positionable dot mark glyphs. This further reduces the number of glyphs needed in the font, and allows for contextual positioning of the dots relative to adjacent shapes.
Received on Friday, 10 October 2014 19:04:52 UTC