W3C home > Mailing lists > Public > www-international@w3.org > July to September 2002

Re: glyph selection for Unicode in browsers

From: <Peter_Constable@sil.org>
Date: Wed, 25 Sep 2002 16:09:31 -0500
To: Unicoders <unicode@unicode.org>, WWW International <www-international@w3.org>, www-international-request@w3.org
Message-ID: <OF2B1272B3.4F8A156E-ON86256C3F.006EA991@sil.org>

On 09/25/2002 01:51:28 PM Tex Texin wrote:

>a) Do Unicode fonts include the language-based glyph variants of
>characters, so that a display system is capable of identifying or
>hinting which glyph should be used in a particular scenario?

They *can*, and some do. When this is the case, then there needs to be some
mechanism to modify the relationship between sequences of characters and
sequences of glyphs to arrive at the particular glyphs intended for the
given language. In general terms, the same kinds of mechanisms than can be
used for rendering complex scripts can also be used here -- it's a glyph
substitution, comparable to substituting an initial or final form of an
Arabic character. Of course, there is a different triggering condition
involved in these situations than in the case of a complex script such as
Arabic: in the complex-script situation, the triggers are the character
context (e.g. preceded by non-word-forming character and followed by
word-forming character), whereas here the trigger is a metadata tag.

Let's consider how this would be dealt with in term of implementation,
using OpenType as an example. The OpenType font format provides means for
storing different glyph-transformation rules according to "language". (1)
The question is, then, what does it take for the rendering process to make
use of one set of language-specific rules rather than another, or rather
than a set of default rules (OT allows the font developer to specify a
default). In OpenType, glyph-transformation rules are grouped by
"features", and a set of rules will be applied when the associated feature
has been activated. (Thus, in OT text layout, what's processed is a
feature-marked-up string of characters.) This applies to the "language"
distinctions as well: the desired "language" must be specified in the
input, otherwise the default rules will apply. (2) The idea is that
application software must determine what features are activated at what

Now, hardly any software gets written to interact directly with the
OpenType layout engine. Instead, higher-level text layout libraries have
been written that wrap the OpenType functionality. Uniscribe is one
example; indeed, in Win32 on Windows 2000 and later, there is even another
layer, since the standard text-drawing functions (TextOut and ExtTextOut)
wrap Uniscribe's functionality. Other examples of libaries that wrap up the
OT interface and expose a higher-level interface include Adobe's CoolType
engine (not a published interface, that I know of), ICU, Pango and Sun's
recent Standard Type Services Framework project.

So, at the OT interface, a "language" tag (3) has to be specified in order
to get language-specific glyphs. But apps generally don't write to that
interface (for good reason); they usually write to a higher interface. The
crux of the issue is that none of the higher-level interfaces, that I know
of, yet provide any mechanism for the app to specify a "language" tag. (4)
Hence, the building blocks are there, but more infrastructure is still
needed. Note that there's a bit more involved that simply re-writing
higer-level APIs to expose a way to specify OT featues. In particular, a
critical issue has to do with the relationship between OpenType's
"language" tags, and whatever system of "language" or "locale" tagging
might be used elsewhere in a given platform.

I've described the situation in terms of OpenType. Neither AAT or Graphite
provide exactly the same kind of mechanism for providing different glyph
transformations for different languages, though I believe some
consideration has been given to possibilities for both technologies. Both
use feature mechanisms, so can certainly do what you're looking for; but
neither has specifically defined features specifically related to
"languages", let alone decided how these should be handled in terms of
APIs. It would be possible to implement an AAT or Graphite font that used a
feature to get at language-specific glyphs, and apps that exposed a
user-interface for setting AAT or Graphite features (5) would offer the
user a way to control this. But there would not be any automation whereby
an app would specify this based on other "language" or "locale" tagging.


(1) I put "language" in quotation marks since it has not really been
adequately worked out what these distinctions are; I think these are
probably groups of writing systems.

(2) OpenType glyph-transformation rules are organised hierarchically, first
by script, then by language, and then according to the other features they
are associated with.

(3) OpenType's "language" tags have no specified relationship with ISO 639,
RFC 3066 or any other system of "language" tags.

(4) The same issue applies to OpenType features that pertain to optional
aspects of typography and rendering that are up to the user's discretion
rather than being obligatory behaviour for a script. For instance, there is
an OpenType feature for selecting small cap forms, which a font developer
can use to provide support for small cap glyphs in the same font as regular
glyphs. To make use of such advanced capabilities, the layout interface to
which an app is written must provide a way to specify such features. Apart
from Adobe's engine (used e.g. by InDesign, and which exposes interfaces
for some OpenType features but, I think, not all), I don't know that any
other layout library yet provides an interface that allows an app to
specify discretionary OpenType features.

(5) In both AAT and Graphite, features are used only for discretionary
aspects of typography / rendering that are not obligatory for a script,
whereas OpenType uses features for both optional and obligatory behaviours.
Thus, for AAT and Graphite, the feature capabilities have always assumed
that apps would provide a user interface whereby the user can set features.
(In OpenType, this makes sense for some but not all features.)
Language-specific typography represents something different from both
obligatory script behaviour and user-preference typography: it would
probably be suitable for automation (i.e. the app uses metadata to
determine via an appropriate API language-specify glyph transformations)
rather than controlling via a user interface. For that reason, it's not
clear to me that this should be handled as just one more kind of feature in
the AAT and Graphite models.

>b) If the above is possible, then I assume the browsers have not
>implemented language-based selection yet.

Still possible basically only in theory (or else with a lot of work to also
re-implement the capabilities of something like Uniscribe), so no browsers
yet implement this.

>Are any browsers moving to
>using the appropriate glyphs based on language without depending on each
>language being assigned a different font?

Probably not yet.

>c) If the above is not possible, then configuring browsers for Unicode
>usage is greatly complicated by the need to have a lengthy list of fonts
>assigned to different languages.

Um hmm.

>Is there an alternative approach that
>can be used, so users can easily view Unicode text and get the correct
>display while using a single "Unicode" font?

This is another big question, and I've said lots already. I'll just mention
techniques known as "font-fallback", "font fixup" or "font-linking" -- all
variations on the idea that if the text is supposed to be rendered using
font X, but that font doesn't have glyphs to support the characters in the
string, then figure out what fonts *will* support those characters and use
those, in spite of what the style properties specify. I don't know that
this kind of thing has been used to provide language-specific glyphs;
usually, it has been viewed as a way to keep the user from seeing boxes (or
other comparable notdef glyphs).

You've certainly touched on topics that are both interesting and important.
I'll leave the remaining questions for someone else.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
Received on Wednesday, 25 September 2002 17:14:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:22 UTC