- From: Richard Wordingham <richard.wordingham@ntlworld.com>
- Date: Mon, 31 Aug 2015 07:05:28 +0100
- To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
On Sun, 30 Aug 2015 14:44:35 +0000 Greg Eck <greck@postone.net> wrote: > Richard, > > Yes, you were right on the <U+1820><U+180E>. I meant to write > <U+180E><U+1820>. Thanks. > > It will take a few days to get back to this. Could you help me in the > meantime with a clear distinction between a toggle and an over-ride > (over-ride as I am using the term). In my mind, they are very > similar. In terms of implementation - an example using OT > substitution rulings would be the easiest to understand. In my examples, I will ignore the complications associated with MVS and NNBSP. == Toggling == A variation selection acts as a toggle if there are two possible forms for a character depending on what characters are near it, and the basic shaping rules will select opposite forms depending on whether the variation selector is present or no valid variation selector is next to it. If the system uses a toggle, one needs only one variation selector for the user to have complete control. The drawback is that implementations very definitely have to agree on the rules. For example, in the TR170 scheme for medial NA, the alternative character inputs are <U+1828> (context-dependent) and <U+1828, U+180B> (opposite setting). At stage B, the cmap is consulted and they are converted to glyphs gN0id and gN0iu. (I have spotted that I don't need temporary intermediate glyphs, and the the dotted initial form is the preferred form for a character picker.) At stage C, the feature medi converts them to the medial forms. Undottedness is the default. The substitution is: gN0id > gN0mu # Undotted medial NA gN0iu > gN0md # Dotted medial NA At stage D, the first batch of rlig lookups are applied. Amongst them is a rule that applies in the contexts: _ vowel_NA _ gZWJ vowel_NA # Seemingly needed for Windows! where vowel_NA is the set of vowel forms that count as vowels. For example, the vowel forms with an extra tooth should not count as vowels. (That extra tooth functions as a consonant.) I don't know whether this rule is being implemented in fonts. In this context, the substitution rule swaps the dotted status. gN0mu > gN0md gN0md > gN0mu Now, one could defer the effect of the variation selector to the end, with an unconditional ligature rule gN0mu gMVS1 > gN0md gN0md gMVS1 > gN0mu The rules would then have to include gMVS1 in their contexts, and I think also gMVS2 and gMVS3. I think it gives a cleaner set of rules to incorporate the variation selectors in the glyph as soon as possible. == Override == To me, a variation selector acts as an override if the combination is unaffected by context rules that affect the character in isolation. For example, in the GB/T 26226-2010 scheme for medial NA, the alternative character inputs are <U+1828> (context-dependent), <U+1828, U+180B> (dotted) and <U+1828, U+180C> (undotted). With overrides both way, the user can be in complete control, and does not have to rely on implementations being in full agreement on the rules. At stage B, the cmap is consulted and they are converted to glyphs as follows: 1828 > gN0id 1828 180B > gN1m 1828 180C > gN2m (I have spotted that I don't need all the temporary intermediate glyphs, and the the dotted initial form is the preferred form for a character picker.) At stage C, the feature medi converts them to the medial forms. Undottedness is the default. The substitution is: gN0id > gN0mu gN1m > gN1m # Null action gN2m > gN2m # Null action At stage D, the first batch of rlig lookups are applied. Amongst them is a rule that applies in the contexts: _ vowel_NA _ gZWJ vowel_NA # Seemingly needed for Windows! where vowel_NA is the set of vowel forms that count as vowels. For example, the vowel forms with an extra tooth should not count as vowels. (That extra tooth functions as a consonant.) In this context, the substitution rule sets the dotted status if there is no valid variation selector: gN0mu > gN0md At the end of stage D, we have to resolve the conflict between Tables A and B. If we assume the rendering engine will only look at a sequence of Mongolian, inherited and common characters, we could have something like a pair of contexts _ mong_shape mong_shape _ where mong_shape is the set of glyphs from Mongolian script characters that are (dual) joining or join-causing. and a consequential substitution gN2m > gN0mu # Undotted medial form Finally in Stage D, we need the unconditional changes to remove the special status: gN1m > gN0md # With rules as above, could have had this at the start. gN2m > gN0fd # Isolated, dotted pre-MVS form (Table B) == FINAL NA == The examples in TR170 and GB/T 26226-2010 give a behaviour of final NA that I cannot reconcile with the tables. Both documents seem to use FVS1 to *toggle* the dotting of NA. For example, FVS1 suppresses the final dot before MVS in the example BAGAN-A in row 7 (straddling pp5-6 in TR 170, Table 9 in the standard), but adds it in the example of Sibe HAN in row 12 or 11. At least the BAGAN-A example is consistent with Table A of GB/T 26226-2010, where final <NA, FVS1> is recorded as *undotted*. Richard.
Received on Monday, 31 August 2015 06:06:10 UTC