Re: FVS Assignment Mismatch WrapUp - GA from Richard Wordingham on 2015-08-29 (public-i18n-mongolian@w3.org from July to September 2015)

From: Richard Wordingham <richard.wordingham@ntlworld.com>
Date: Sat, 29 Aug 2015 21:53:15 +0100
To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
Message-ID: <20150829215315.6da1bfac@JRWUBU2>
On Sat, 29 Aug 2015 08:57:48 +0000
Greg Eck <greck@postone.net> wrote:
 
> 1.)    Final GA Examples
 
> Yes, I think the SIG example is the best known example also. I have
> attached a snip from the Chinese Standard. This is the line item that
> you were referring to?

Yes.

> My best attempt to explain the phenomenom would
> be to say that the U+1822 I, usually considered to be neutral in
> gender and therefore favoring the feminine, is in the case of
> [cid:image001.png@01D0E26C.0C9E2320]  favoring the masculine, and
> therefore the final masculine sweep to the right.

My explanation would be that the gender also resides in the back
consonants (uvular v. velar according to some), or at least, those that
end syllables.

 Personally, I
> consider them to be the masculine/feminine forms of the same word.
> Maybe the lemma is the feminine form and the variant is the masculine
> form.

> 2.)    Final GA Specification & Toggle
> 
> Can we have a bit of discussion about your statement below ...

> Does that mean that the following description
<picture omitted - see my answer below.>
> Should actually look more like this ...
<picture omitted - see my answer following.>

Yes, it means that we wouldn't list a separate undotted masculine
final form with a comment such as '(needed to over-ride default
context) (SIG+FVS1) (TOTEMLIG+FVS1)'.  We'd list the feminine final
form with a comment such as '(context-driven; FVS1 toggles with first
final form.)'.

A bit of explanation is certainly in order.

For TR170, we have the example SIG+FVS1 with a masculine final form,
but Appendix A, which lists the stand-alone forms, shows two codes
for final forms (by joining rules) - the undotted masculine form as
<ZWJ, GA> and the feminine final form as <ZWJ, GA, FVS1>.  However,
<SA, I, GA, FVS1> has undotted final masculine GA. 

I fear I may have misread GB/T 26226-2010.  I hadn't appreciated the
difference between Table A, which shows the variation sequences needed
in connected text, and Table B, which shows the sequences for
displaying glyphs in isolation.

I think I mush have looked at Table A, and seen just one final form
listed under GA.  However, I probably just looked at the glyph pictures
and names - I wouldn't have looked at the coding sequence.  If I had, I
would have seen the anomalous encoding of the non-final dotted feminine
GA being encoded as ".... GA FVS2", with the FVS2 apparently
necessary.  I can now guess at what should have been in that row.

When I look at Table B, I see undotted final masculine GA encoded <ZWJ,
GA, FVS1> and final feminine GA encoded <ZWJ, GA, FVS2>.  Therefore it
is seems that if the Chinese standard has a consistent encoding in
mind, it does *not* include a gender toggle for GA.  The mislabelled
form in Table A must be a corruption of the entry for feminine final GA.

In Table A for NA, we find for medial NA that <NA, FVS1> gives dotted
NA (second medial form) and <NA, FVS3> gives undotted NA (first
medial form).  Clearly the idea of toggles has been abandoned!  

Now, the problem with toggles is that one needs a well-defined set of
rules.  With overrides instead, one can arrange that the text
generated is far less vulnerable to vagaries of renderer's
glyph selection rules.  The results look good.  They also have the
possible cynical bonus of playing havoc with text-processing tools.

> If so, then we are not communicating to font developers or end-users
> that there is actually a U+182D+FVS1 specification. Font developers
> will need to implement it. Typists will want to use it. Maybe I am
> misunderstanding your statement?

You understood.  However, the idea of toggles seems to have been
abandoned.

> 3.)    Input Methods
> 
> I take another read on the following TR170 statement ...

>> "The mechanism of inputting characters is not specified by the
>> standard, so any keyboard driver capable of generating the
>> appropriate 16-bit character encodings can be used.  However, the
>> input mechanism should ideally generate the correct positional forms,
>> variants and ligatures on input by analysis of the context of each
>> letter, at least where possible."

> My read is that the emitted text string is the same. And what may be
> different is the input method. For example, we are experimenting with
> a keyboard that has an alternative mapping at SHFT-A for emitting the
> sequence <U+1820><U+180E>. Likewise for SHFT-E. This eliminates one
> keystroke. It eliminates dealing with the MVS directly. This
> experimental keyboard will emit identical text with the person who
> manually types the two keystrokes <U+1820><U+180E>.

The mechanism you describe generates a character sequence <U+180E,
U+1820>.  (At least, I hope that is what you meant.)  It has no
conception of positional forms, variants or ligatures.  The processing
as you describe it is no different to an English keyboard being given a
key assignment for the sequence 'ing', or replacing '&' by '&#x' in an
HTML editor.

The text might possibly refer to something like resolving medial NA to
a dotted or undotted form and generating the appropriate variation
selector, but I think that is anachronistic.

> While I like the
> idea of a standard keyboard, I know that it is not possible until an
> army imposes it on the masses! If we have the time at the end of our
> discussion, we might talk about keyboards.

A sensible topic is what capabilities a keyboard should have.  Keyboard
layout is a completely different topic, and is subject to user
profiles.  For someone who overwhelmingly typed in Mongolian script,
ergonomic considerations would matter.  For someone who normally types
in Cyrillic, the layout might best be basically a Cyrillic phonetic
layout.  For someone who regularly uses pinyin, a Latin-based phonetic
layout would make sense.  

Layouts should probably come in sets of 7:

Traditional/Todo/Sibe/Manchu/Traditional Tibetan/Todo Tibetan/Manchu
Tibetan.

Mixing writing systems will really make text processing difficult! I
find it quite easy to accidentally pepper my own writings with U+0261
LATIN SMALL LETTER SCRIPT G.

Richard.
Received on Saturday, 29 August 2015 20:53:55 UTC