- From: <Peter_Constable@sil.org>
- Date: Wed, 25 Sep 2002 16:09:31 -0500
- To: Unicoders <unicode@unicode.org>, WWW International <www-international@w3.org>, www-international-request@w3.org
On 09/25/2002 01:51:28 PM Tex Texin wrote: >a) Do Unicode fonts include the language-based glyph variants of >characters, so that a display system is capable of identifying or >hinting which glyph should be used in a particular scenario? They *can*, and some do. When this is the case, then there needs to be some mechanism to modify the relationship between sequences of characters and sequences of glyphs to arrive at the particular glyphs intended for the given language. In general terms, the same kinds of mechanisms than can be used for rendering complex scripts can also be used here -- it's a glyph substitution, comparable to substituting an initial or final form of an Arabic character. Of course, there is a different triggering condition involved in these situations than in the case of a complex script such as Arabic: in the complex-script situation, the triggers are the character context (e.g. preceded by non-word-forming character and followed by word-forming character), whereas here the trigger is a metadata tag. Let's consider how this would be dealt with in term of implementation, using OpenType as an example. The OpenType font format provides means for storing different glyph-transformation rules according to "language". (1) The question is, then, what does it take for the rendering process to make use of one set of language-specific rules rather than another, or rather than a set of default rules (OT allows the font developer to specify a default). In OpenType, glyph-transformation rules are grouped by "features", and a set of rules will be applied when the associated feature has been activated. (Thus, in OT text layout, what's processed is a feature-marked-up string of characters.) This applies to the "language" distinctions as well: the desired "language" must be specified in the input, otherwise the default rules will apply. (2) The idea is that application software must determine what features are activated at what point. Now, hardly any software gets written to interact directly with the OpenType layout engine. Instead, higher-level text layout libraries have been written that wrap the OpenType functionality. Uniscribe is one example; indeed, in Win32 on Windows 2000 and later, there is even another layer, since the standard text-drawing functions (TextOut and ExtTextOut) wrap Uniscribe's functionality. Other examples of libaries that wrap up the OT interface and expose a higher-level interface include Adobe's CoolType engine (not a published interface, that I know of), ICU, Pango and Sun's recent Standard Type Services Framework project. So, at the OT interface, a "language" tag (3) has to be specified in order to get language-specific glyphs. But apps generally don't write to that interface (for good reason); they usually write to a higher interface. The crux of the issue is that none of the higher-level interfaces, that I know of, yet provide any mechanism for the app to specify a "language" tag. (4) Hence, the building blocks are there, but more infrastructure is still needed. Note that there's a bit more involved that simply re-writing higer-level APIs to expose a way to specify OT featues. In particular, a critical issue has to do with the relationship between OpenType's "language" tags, and whatever system of "language" or "locale" tagging might be used elsewhere in a given platform. I've described the situation in terms of OpenType. Neither AAT or Graphite provide exactly the same kind of mechanism for providing different glyph transformations for different languages, though I believe some consideration has been given to possibilities for both technologies. Both use feature mechanisms, so can certainly do what you're looking for; but neither has specifically defined features specifically related to "languages", let alone decided how these should be handled in terms of APIs. It would be possible to implement an AAT or Graphite font that used a feature to get at language-specific glyphs, and apps that exposed a user-interface for setting AAT or Graphite features (5) would offer the user a way to control this. But there would not be any automation whereby an app would specify this based on other "language" or "locale" tagging. Notes: (1) I put "language" in quotation marks since it has not really been adequately worked out what these distinctions are; I think these are probably groups of writing systems. (2) OpenType glyph-transformation rules are organised hierarchically, first by script, then by language, and then according to the other features they are associated with. (3) OpenType's "language" tags have no specified relationship with ISO 639, RFC 3066 or any other system of "language" tags. (4) The same issue applies to OpenType features that pertain to optional aspects of typography and rendering that are up to the user's discretion rather than being obligatory behaviour for a script. For instance, there is an OpenType feature for selecting small cap forms, which a font developer can use to provide support for small cap glyphs in the same font as regular glyphs. To make use of such advanced capabilities, the layout interface to which an app is written must provide a way to specify such features. Apart from Adobe's engine (used e.g. by InDesign, and which exposes interfaces for some OpenType features but, I think, not all), I don't know that any other layout library yet provides an interface that allows an app to specify discretionary OpenType features. (5) In both AAT and Graphite, features are used only for discretionary aspects of typography / rendering that are not obligatory for a script, whereas OpenType uses features for both optional and obligatory behaviours. Thus, for AAT and Graphite, the feature capabilities have always assumed that apps would provide a user interface whereby the user can set features. (In OpenType, this makes sense for some but not all features.) Language-specific typography represents something different from both obligatory script behaviour and user-preference typography: it would probably be suitable for automation (i.e. the app uses metadata to determine via an appropriate API language-specify glyph transformations) rather than controlling via a user interface. For that reason, it's not clear to me that this should be handled as just one more kind of feature in the AAT and Graphite models. >b) If the above is possible, then I assume the browsers have not >implemented language-based selection yet. Still possible basically only in theory (or else with a lot of work to also re-implement the capabilities of something like Uniscribe), so no browsers yet implement this. >Are any browsers moving to >using the appropriate glyphs based on language without depending on each >language being assigned a different font? Probably not yet. >c) If the above is not possible, then configuring browsers for Unicode >usage is greatly complicated by the need to have a lengthy list of fonts >assigned to different languages. Um hmm. >Is there an alternative approach that >can be used, so users can easily view Unicode text and get the correct >display while using a single "Unicode" font? This is another big question, and I've said lots already. I'll just mention techniques known as "font-fallback", "font fixup" or "font-linking" -- all variations on the idea that if the text is supposed to be rendered using font X, but that font doesn't have glyphs to support the characters in the string, then figure out what fonts *will* support those characters and use those, in spite of what the style properties specify. I don't know that this kind of thing has been used to provide language-specific glyphs; usually, it has been viewed as a way to keep the user from seeing boxes (or other comparable notdef glyphs). You've certainly touched on topics that are both interesting and important. I'll leave the remaining questions for someone else. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <peter_constable@sil.org>
Received on Wednesday, 25 September 2002 17:14:21 UTC