W3C home > Mailing lists > Public > public-i18n-mongolian@w3.org > July to September 2015

Re: FVS Assignment Mismatch WrapUp - Definition of Toggling

From: Richard Wordingham <richard.wordingham@ntlworld.com>
Date: Mon, 31 Aug 2015 07:05:28 +0100
To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
Message-ID: <20150831070528.2b4e9e8c@JRWUBU2>
On Sun, 30 Aug 2015 14:44:35 +0000
Greg Eck <greck@postone.net> wrote:

> Richard,
> 
> Yes, you were right on the <U+1820><U+180E>. I meant to write
> <U+180E><U+1820>. Thanks.
> 
> It will take a few days to get back to this. Could you help me in the
> meantime with a clear distinction between a toggle and an over-ride
> (over-ride as I am using the term). In my mind, they are very
> similar. In terms of implementation - an example using OT
> substitution rulings would be the easiest to understand.

In my examples, I will ignore the complications associated with MVS and
NNBSP.

== Toggling ==
A variation selection acts as a toggle if there are two possible forms
for a character depending on what characters are near it, and the basic
shaping rules will select opposite forms depending on whether the
variation selector is present or no valid variation selector is next to
it.

If the system uses a toggle, one needs only one variation selector for
the user to have complete control.  The drawback is that implementations
very definitely have to agree on the rules.

For example, in the TR170 scheme for medial NA, the alternative
character inputs are <U+1828> (context-dependent) and <U+1828, U+180B>
(opposite setting).

At stage B, the cmap is consulted and they are converted to glyphs
gN0id and gN0iu.  (I have spotted that I don't need temporary
intermediate glyphs, and the the dotted initial form is the preferred
form for a character picker.)

At stage C, the feature medi converts them to the medial forms.
Undottedness is the default.  The substitution is:

gN0id > gN0mu # Undotted medial NA
gN0iu > gN0md # Dotted medial NA

At stage D, the first batch of rlig lookups are applied.  Amongst them
is a rule that applies in the contexts:

_ vowel_NA
_ gZWJ vowel_NA # Seemingly needed for Windows!

where vowel_NA is the set of vowel forms that count as vowels.  For
example, the vowel forms with an extra tooth should not count as
vowels.  (That extra tooth functions as a consonant.)  I don't know
whether this rule is being implemented in fonts.  In this context, the
substitution rule swaps the dotted status.

gN0mu > gN0md
gN0md > gN0mu

Now, one could defer the effect of the variation selector to the end,
with an unconditional ligature rule

gN0mu gMVS1 > gN0md
gN0md gMVS1 > gN0mu

The rules would then have to include gMVS1 in their contexts, and I
think also gMVS2 and gMVS3.  I think it gives a cleaner set of rules to
incorporate the variation selectors in the glyph as soon as possible.

== Override ==
To me, a variation selector acts as an override if the combination is
unaffected by context rules that affect the character in isolation.
For example, in the GB/T 26226-2010 scheme for medial NA, the
alternative character inputs are <U+1828> (context-dependent), <U+1828,
U+180B> (dotted) and <U+1828, U+180C> (undotted).

With overrides both way, the user can be in complete control, and does
not have to rely on implementations being in full agreement on the
rules.

At stage B, the cmap is consulted and they are converted to glyphs as
follows:

1828      > gN0id
1828 180B > gN1m
1828 180C > gN2m

(I have spotted that I don't need all the temporary intermediate
glyphs, and the the dotted initial form is the preferred form for a
character picker.)

At stage C, the feature medi converts them to the medial forms.
Undottedness is the default.  The substitution is:

gN0id > gN0mu
gN1m  > gN1m # Null action
gN2m  > gN2m # Null action

At stage D, the first batch of rlig lookups are applied.  Amongst them
is a rule that applies in the contexts:

 _ vowel_NA
 _ gZWJ vowel_NA # Seemingly needed for Windows!

where vowel_NA is the set of vowel forms that count as vowels.  For
example, the vowel forms with an extra tooth should not count as
vowels.  (That extra tooth functions as a consonant.)  In this context,
the substitution rule sets the dotted status if there is no valid
variation selector:

gN0mu > gN0md

At the end of stage D, we have to resolve the conflict between Tables A
and B. If we assume the rendering engine will only look at a sequence of
Mongolian, inherited and common characters, we could have something
like a pair of contexts

 _ mong_shape
 mong_shape _

where mong_shape is the set of glyphs from Mongolian script characters
that are (dual) joining or join-causing.

and a consequential substitution

gN2m > gN0mu # Undotted medial form

Finally in Stage D, we need the unconditional changes to remove the
special status:

gN1m > gN0md # With rules as above, could have had this at the start.
gN2m > gN0fd # Isolated, dotted pre-MVS form (Table B)

== FINAL NA ==
The examples in TR170 and GB/T 26226-2010 give a behaviour of final NA
that I cannot reconcile with the tables.  Both documents seem to use
FVS1 to *toggle* the dotting of NA.  For example, FVS1 suppresses the
final dot before MVS in the example BAGAN-A in row 7 (straddling pp5-6
in TR 170, Table 9 in the standard), but adds it in the example of Sibe
HAN in row 12 or 11.  At least the BAGAN-A example is consistent with
Table A of GB/T 26226-2010, where final <NA, FVS1> is recorded as
*undotted*.

Richard.
Received on Monday, 31 August 2015 06:06:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:07:05 UTC