- From: Adam Twardoch (List) <list.adam@twardoch.com>
- Date: Fri, 30 Oct 2009 01:12:41 -0600
- To: Jonathan Kew <jonathan@jfkew.plus.com>
- CC: Stephen Zilles <szilles@adobe.com>, HÃ¥kon Wium Lie <howcome@opera.com>, www-style <www-style@w3.org>, www-font <www-font@w3.org>
Let me summarize a few important points about how OpenType Layout features are meant to be used. I. FEATURE CLASSIFICATION OpenType Layout features can be divided into various categories, based on various criteria (see [1] below for more information): a) "Show in UI": determines whether a certain feature should be somehow exposed in an application's UI. A different view of seeing this categorization is whether a certain feature should be user-controllable or if it should be controlled "behind the scenes" by the OpenType Layout engine. Obviously, the user-controllable features should have some kind of exposure through CSS. b) "Script": indicates a generalization of which features are used with which scripts (writing systems). Some OpenType Layout features are only used with some particular scripts, while others are applicable to all scripts. c) "Script-specific shaping": the OpenType Layout process is divided into three phases: before the script-specific shaping, during the script-specific shaping and after the script-specific shaping. Some features should be applied to all glyphs before the script-specific shaping algorithms kick in, others are automatically applied by the shaping engine during the script-specific shaping, and finally the last group is applied after the script-specific shaping, and these are typically the features that are user-controllable. d) "Applied by default" indicates whether a feature should be applied by default (while the user may or may not have the opportunity to turn it off), or should be off by default (in which case the user should have the opportunity to turn it on). e) "Functional category": features can be coarsly split into two large groups: one related to language (features that ensure that a certain orthographic tradition is followed or even that the text is orthographically correct at all) and one related to typography (features that allow the user to select a certain typographic/stylistic treatment). II. LOOKUP CLASSIFICATION OpenType Layout features are realized through series of lookups that can perform two types of actions: substitutions (replacing some glyphs with others, stored in the OpenType GSUB table) and positioning (adjusting the width and the x/y position of some glyphs, stored in the GPOS table). The most important aspect of this is that while most features can be just turn on and off (i.e. their status is binary), other features may need an additional feature parameter. For example, if a feature such as "salt" (stylistic alternates) is realized through GSUB LookupType 3 (alternate substitution, one-to-one-out-of-many), then it is necessary to specify a numerical parameter that allows to select the alternate out of the set of alternates. III. LANGUAGE CLASSIFICATION All OpenType Layout features are assigned in a context of specific script and language system. While the assignment of script is easy (the engine can determine from the Unicode string which script a certain character belongs to, and from there it can pick the appropriate OpenType script branch to apply the features for), the language system is trickier. As you can see from http://www.microsoft.com/typography/otspec/languagetags.htm OpenType uses a list of language systems that do not have a 1:1 correspondence with any of the ISO 639 standards. In OpenType 1.6 (at the link above), an informational mapping of OpenType language system tags and "best matches" in the ISO 639 standards has been provided. It is quite obvious that a web browser that applies OpenType Layout features should observe the HTML "lang" attribute and, if present, apply the appropriate features from the particular language system branch in a font (and only if absent, apply the features from the Default language system within a script branch). But it might be worth considering to add a low-level CSS access mechanism to allow users to choose a specific OpenType language system, because some ISO 639 codes can map to several OpenType language systems, e.g. (OT) (ISO) Chinese Hong Kong ZHH zho Chinese Phonetic ZHP zho Chinese Simplified ZHS zho Chinese Traditional ZHT zho IV. HUMAN-READABLE vs. LOW-LEVEL OT FEATURE ACCESS I realize that it is of great value to have a mechanism where the most OpenType Layout features are accessed through human-readable CSS properties. For some, such as the OpenType "smcp" feature, existing CSS properties such as "font-variant: small-caps" should be used. For others, new CSS properties such as those proposed by Jonathan et al. However, in addition, I think it would be very useful to have a low-level mechanism to specifically control the OpenType Layout features directly. See [2] below for some thoughts that I had on that subject. Best, Adam == [1] Classification of OpenType Layout features, draft by Adam Twardoch: http://www.twardoch.com/tmp/OpenTypeFeaturesClassification.xls In the course of discussion in regard to the OpenType 1.5 and 1.6 specification revisions that took place last year, I have circulated a draft categorization of OpenType Layout features based on some criteria. The document mentioned above is that draft classification of the OpenType Layout features that were registered in OpenType 1.5. I also included the Microsoft-specific MATH engine features that are not officially part of the OT spec, but I have not yet included the features added to OpenType 1.6. The document is an Excel spreadsheet with the following information: "Tag" and "Friendly name" of all features found in OT 1.5 plus MATH "Show in UI" which determines whether a certain feature should be somehow exposed in an application's UI. "no" means that no UI is necessary, "yes" generally means that a UI element should be exposed that is directly related to the feature, "special" indicates a special treatment for the UI, e.g. associating the feature activity with some general-level application or document preferences (e.g. optical bounds or CJK orientation). "UI level" indicates at what level the UI should be implemented: none, character, word, paragraph, document. Some features are sensibly applied to just one character or a few, while others can be applied to long runs of text. "Script" indicates a generalization of which features are used with which scripts. This is not 100% accurate, I think it'd be a good idea to produce an exhaustive mapping of all registered features and mapping them to all registered script tags. Currently, the OT spec has some unclear wording e.g. "Indic scripts similar to Devanagari". So the column sometimes uses script tags and sometimes generic terms like "ALL", "INDIC", "ARABIC", "RTL". I think that it would be useful to categorize the OpenType script tag list into such groups (so there is an exhaustive mapping of which script tags are classified as "European", "Indic", "Arabetic", "CJK" etc., plus which writing direction they may have (three columns: LTR, RTL, vert). I'd like to add that to the 2nd phase of the project. "Script-specific shaping" is the column that has the actual classification of when, in relation to script-specific shaping, a feature is being applied: before the script-specific shaping (I was able to come up with only four definitive entries for it: ccmp, locl, rtla and size), during the script-specific shaping, or after the script-specific shaping. Unfortunately, Adobe follows a different paradigm of describing their features than Microsoft. I think Adobe's CJK layout principles would be better off if described in form of a shaping specification like Microsoft's, rather than spreading it around the feature description list. Therefore, I have classified all of Adobe's CJK features as "to be applied after shaping", since "shaping" is not defined in this context -- though I think it could. "Applied by default" indicates whether the feature should be always on by default, never on by default, or whether shaping (or in CJK case, orientation) determines if the feature is applied. "Functional category" is just a loose way of classifying the OpenType Layout features into some categories. There is a major distinction between "language" and "typography" (there is such distinction in the script-specific specs already), plus additional subcategorization into "Asian CJK", "complex scripts", "basic support", "numerals and scientific", "letter case" and "variants". == [2] Notes on a low-level tagging mechanism for OT Layout features in CSS Below are some notes that I've written to Michael Jansson in 2006 when he implemented his own extensions to CSS that allowed OpenType Layout features selection in GlyphGate (this was done through the "text-otl" CSS property). Now I realize that the particular syntax I proposed below may not be most conformant to the CSS best practices, so the details of the syntax might be revised, but I think it'd be rather worthy to have this kind of mechanism. The low-level access mechanism for OT Layout features should allow the document designer to: 1. Explicitly turn OFF certain features (e.g. "-kern") 2. Explicitly call variant numbers in one-to-one-out-of-many substitutions (e.g. "salt/3") 3. Explicitly specify the writing system of the text by specifying a script tag (e.g. "latn/liga" vs. "arab/liga"). 4. Explicitly specify the language system of the text by specifying a language tag (e.g. "latn/liga" vs. "latn/TRK/liga"). I believe that all of the above would be useful. The parser should give a higher priority to the specific OTL script and language tags and only in their absence, infer the language from the HTML "lang" attribute, and the script from the Unicode properties of the current text. I believe my simple syntax that I proposed above would actually be enough, given the specifics: * all parts of the tagging are separated by slashes * there are up to four parts (script/language/feature/variant) * if there is only one part specified, it is the feature tag; it may be prefixed with a "-" sign that signifies turning off features that might be turned on be default; * if there is more than one part, check the last part; if it is a integer number, then it is the variant number; disregard this from the remaining analysis; * if there is one part, it is the feature tag; * if there are two parts, the first is the script tag (which can be "DFLT", all in uppercase, or otherwise can be a lowercase-only string of four letters), the second the language, the third the feature. * if there are three parts, the first is the script tag, the second is the language tag, the third the feature tag. Examples: text-otl:liga text-otl:-ccmp text-otl:latn/salt/4 text-otl:latn/TRK/liga text-otl:cyrl/SRB/locl text-otl:DFLT/ornm/6 ==
Received on Friday, 30 October 2009 07:13:43 UTC