RE: FVS Assignment Mismatch WrapUp - GA


Yes, you were right on the <U+1820><U+180E>. I meant to write <U+180E><U+1820>. Thanks.

It will take a few days to get back to this. Could you help me in the meantime with a clear distinction between a toggle and an over-ride (over-ride as I am using the term). In my mind, they are very similar. In terms of implementation - an example using OT substitution rulings would be the easiest to understand.


-----Original Message-----
From: Richard Wordingham [] 
Sent: Sunday, August 30, 2015 4:53 AM
Subject: Re: FVS Assignment Mismatch WrapUp - GA

On Sat, 29 Aug 2015 08:57:48 +0000
Greg Eck <> wrote:
> 1.)    Final GA Examples
> Yes, I think the SIG example is the best known example also. I have 
> attached a snip from the Chinese Standard. This is the line item that 
> you were referring to?


> My best attempt to explain the phenomenom would be to say that the 
> U+1822 I, usually considered to be neutral in gender and therefore 
> favoring the feminine, is in the case of 
> [cid:image001.png@01D0E26C.0C9E2320]  favoring the masculine, and 
> therefore the final masculine sweep to the right.

My explanation would be that the gender also resides in the back consonants (uvular v. velar according to some), or at least, those that end syllables.

 Personally, I
> consider them to be the masculine/feminine forms of the same word.
> Maybe the lemma is the feminine form and the variant is the masculine 
> form.

> 2.)    Final GA Specification & Toggle
> Can we have a bit of discussion about your statement below ...

> Does that mean that the following description
<picture omitted - see my answer below.>
> Should actually look more like this ...
<picture omitted - see my answer following.>

Yes, it means that we wouldn't list a separate undotted masculine final form with a comment such as '(needed to over-ride default
context) (SIG+FVS1) (TOTEMLIG+FVS1)'.  We'd list the feminine final form with a comment such as '(context-driven; FVS1 toggles with first final form.)'.

A bit of explanation is certainly in order.

For TR170, we have the example SIG+FVS1 with a masculine final form, but Appendix A, which lists the stand-alone forms, shows two codes for final forms (by joining rules) - the undotted masculine form as <ZWJ, GA> and the feminine final form as <ZWJ, GA, FVS1>.  However, <SA, I, GA, FVS1> has undotted final masculine GA. 

I fear I may have misread GB/T 26226-2010.  I hadn't appreciated the difference between Table A, which shows the variation sequences needed in connected text, and Table B, which shows the sequences for displaying glyphs in isolation.

I think I mush have looked at Table A, and seen just one final form listed under GA.  However, I probably just looked at the glyph pictures and names - I wouldn't have looked at the coding sequence.  If I had, I would have seen the anomalous encoding of the non-final dotted feminine GA being encoded as ".... GA FVS2", with the FVS2 apparently necessary.  I can now guess at what should have been in that row.

When I look at Table B, I see undotted final masculine GA encoded <ZWJ, GA, FVS1> and final feminine GA encoded <ZWJ, GA, FVS2>.  Therefore it is seems that if the Chinese standard has a consistent encoding in mind, it does *not* include a gender toggle for GA.  The mislabelled form in Table A must be a corruption of the entry for feminine final GA.

In Table A for NA, we find for medial NA that <NA, FVS1> gives dotted NA (second medial form) and <NA, FVS3> gives undotted NA (first medial form).  Clearly the idea of toggles has been abandoned!  

Now, the problem with toggles is that one needs a well-defined set of rules.  With overrides instead, one can arrange that the text generated is far less vulnerable to vagaries of renderer's glyph selection rules.  The results look good.  They also have the possible cynical bonus of playing havoc with text-processing tools.

> If so, then we are not communicating to font developers or end-users 
> that there is actually a U+182D+FVS1 specification. Font developers 
> will need to implement it. Typists will want to use it. Maybe I am 
> misunderstanding your statement?

You understood.  However, the idea of toggles seems to have been abandoned.

> 3.)    Input Methods
> I take another read on the following TR170 statement ...

>> "The mechanism of inputting characters is not specified by the 
>> standard, so any keyboard driver capable of generating the 
>> appropriate 16-bit character encodings can be used.  However, the 
>> input mechanism should ideally generate the correct positional forms, 
>> variants and ligatures on input by analysis of the context of each 
>> letter, at least where possible."

> My read is that the emitted text string is the same. And what may be 
> different is the input method. For example, we are experimenting with 
> a keyboard that has an alternative mapping at SHFT-A for emitting the 
> sequence <U+1820><U+180E>. Likewise for SHFT-E. This eliminates one 
> keystroke. It eliminates dealing with the MVS directly. This 
> experimental keyboard will emit identical text with the person who 
> manually types the two keystrokes <U+1820><U+180E>.

The mechanism you describe generates a character sequence <U+180E,
U+1820>.  (At least, I hope that is what you meant.)  It has no
conception of positional forms, variants or ligatures.  The processing as you describe it is no different to an English keyboard being given a key assignment for the sequence 'ing', or replacing '&' by '&#x' in an HTML editor.

The text might possibly refer to something like resolving medial NA to a dotted or undotted form and generating the appropriate variation selector, but I think that is anachronistic.

> While I like the
> idea of a standard keyboard, I know that it is not possible until an 
> army imposes it on the masses! If we have the time at the end of our 
> discussion, we might talk about keyboards.

A sensible topic is what capabilities a keyboard should have.  Keyboard layout is a completely different topic, and is subject to user profiles.  For someone who overwhelmingly typed in Mongolian script, ergonomic considerations would matter.  For someone who normally types in Cyrillic, the layout might best be basically a Cyrillic phonetic layout.  For someone who regularly uses pinyin, a Latin-based phonetic layout would make sense.  

Layouts should probably come in sets of 7:

Traditional/Todo/Sibe/Manchu/Traditional Tibetan/Todo Tibetan/Manchu Tibetan.

Mixing writing systems will really make text processing difficult! I find it quite easy to accidentally pepper my own writings with U+0261 LATIN SMALL LETTER SCRIPT G.


Received on Sunday, 30 August 2015 14:45:07 UTC