Re: Reference Scheme for Mongolian Rendering

On Fri, 14 Aug 2015 10:10:22 +0000
Greg Eck <> wrote:

> Hi Richard,
> Good notes. Is this usually done for a given script/language?

The Unicode Standard's section for Arabic talks about the mechanism of
joining and how ZWJ and ZWNJ interfere with it.  In part, this is
because Arabic is the first normally cursive script.  Similarly, the
Devanagari section goes into some depth because ZWJ and ZWNJ interfere
with conjunct formation.  It also has to address the fallback situation
where what is encoded as one akshara is rendered as two or more aksharas
because the font can't handle it.

The problem with Mongolian is that the user has to know when
intervene.  He can't just leave rendering to the font.

> Comments here:
> 1.) I agree in philosophy. The things I assume may actually be the
> things another person is just learning. So, better to lay out
> everything. I think it would be good for us to have a discussion on
> keyboards at the end of our talk. 

> There should be some
> standardization on keyboard mappings. If standardization is not
> possible, then at least a listing of the various mappings used on
> simple keyboards.

> A discussion on smart keyboards could follow from
> there.

A discussion of smart keyboards does not depend on where the keys
are, but at most on what they are conceived as being for.  Which
languages were you thinking of?  The first job is to sort out the
exemplar characters.  Does punctuation depend on the orientation? 

> 4.) I know of 4
> genuine over-ride situations where we need an FVS to actually
> over-ride the default - U+1822, U+1828, U+182D (2x). I can provide
> some simple rules and see if we can map them into the nomenclature
> you suggest.

Note that the overrides themselves are generally achieved by the shapes
selected by variation selectors not being contextually modified.  Thus,
what you will see is the application of contextual rules.

Override #1:

Functionally, U+1822 I is proposed to have seven glyphs functionally.
They are:

gI0s, gI0i, gI0m, gI0f; gI1m; gI2m and gI3m.

gI2m differs from gI0m in that is not affected by contextual changes
until Stage E, when ligation occurs.

gI1m is glyph with prefixed aleph, gI3m is double yodh.

To simplify the rule, I need notation to define a 'class' as in the OT
sense.  For example:

cMedialVowel = {gA0m, gA1m, gE*m, gI*m, gO*m, gU*m, gOE*m, gUE*m, gEEm,
gTE*m, gTI*m, gTO*m, gTU*m, gTOE*m, gTUE*m, gSE*m, gSI*m, gSIYm,
gSUE*m, gSUm, gMI*m, gAGA*m, gAGI*m, gAGHUm}

(T = Todo, S=Sibe, M=Manchu, AG = Ali Gali, AGH = Ali Gali Half)

Should gA1i be in the list?  Should the medial vowels with a
prefixed aleph (gA1m, gO1m, gU1m, gOE2m, gUE2m, gTE1m, gTI1m, gTO1m,
gTU1m, gTOE1m, gTUE1m, gSI1m, gMI1m)

I believe ZWJ should be an alternative to 'Mongolian_Letter'.  The rule
then becomes:

gI0m > gI3m / cMedialVowel _

Fonts may subsequently include gI2m > gI0m to simplify the ligation

Override #2:
Functionally, NA is proposed to have 7 glyphs, making 9 names if text
is not to be wilfully corrupted:

gN0s, gN0i, gNOm, gNOf; gN1i, gN1m, gN1f; gN2m; gN3m

gN0s and gN0i should probably be the same glyph.  Mongolian Baiti will
misrender Unicode 8.0.0-compliant text and have gN2m with the same
appearance as gN0m, make 8 glyphs functionally.

cFinalVowel = {gA*f, gE*f, gI*f, gO*f, gU*f, gOE*f, gUE*f, gEEf, gTE*f,
gTI*f, gTO*f, gTU*f, gTOE*f, gTUE*f, gSE*f, gSI*f, gSIYf, gSUE*f, gSU*f
gMI*f, gAGA*f, gAGI*f, gAGHUf}

cVowel = {cMedialVowel, cFinalVowel}

I am assuming that no further rules in Stage D will affect true medial

Using Mongolian Baiti rules, we then have

gN0m > gN1m / _ cVowel

If we instead went with Martin Heijdra's observation about FSV1 being a
toggle, we would have the parallel rules:

gN0m > gN1m / _ cVowel

gN1m > gN0m / _ cVowel

These would also fit nicely with rules for the final NA:

gN0f > gN1f / _ gMVS

gN1f > gN0f / _ gMVS

Again, there will be no further change in Stage D.

We could also make gN2m a synonym of gN1f.

Override #3

This looks like another attempt to get the Mongolian script
deprecated on stability grounds.

I will assume that the output of this rule will not be changed at Stage
D.  This is a big assumption.

cDS = {gD*i, gD*m, gTD, gSD*i, gSD*m, gAGD, gS*i, gS*m}

Completion of cBackVowel and cCons is an exercise to the reader.
Should cBackVowel include gO1m and gU1m?  The alephs mark the join of
two words.

gG0m > gG0m / cDS _ else # Bleed!

gG0m > gG0m cVowel / cDS _ else # Bleed!

gG0m > gG0m cVowel cVowel / cDS _ else # Bleed!

gG0m > gG1m / _ cBackVowel else

gG0m > gG1m / _ cCons cBackVowel

I'd write it more elegantly for my compiler, but its notation's a bit
close to the metal (though not as close as TTX!).

Override #4:

The logic's a bit creaky.  Isn't the correct logic:

If no preceding vowel
then masculine
else if last sexed vowel is masculine
then masculine
else feminine?

Personally, I think sexing the word is unreasonable for rendering.
Still, returning to your rules:

cLetter = {...} # Exercise for reader.  Allowing finals wouldn't hurt.
cFemVowel = {gE**, gI**, gOE**, gUE**, gEE**} # Also allows finals

# I assume the change will not be further modified by any rules at Stage

gG0f > gG2f / cFemVowel 0{cLetter}19 _ else

# This is a rule better handled by a morx table! 

>> Do we need to
>> identify suffix rules for every language that might conceivably be
>> written in the Mongolian script with separated suffixes?

> 5.) Yes

Who's handling French?

> Can we add your paper to the NOTES section of Richard Ishida's
> attachments at ?



Received on Friday, 14 August 2015 23:32:04 UTC