- From: Richard Wordingham <richard.wordingham@ntlworld.com>
- Date: Sat, 15 Aug 2015 00:31:29 +0100
- To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
On Fri, 14 Aug 2015 10:10:22 +0000 Greg Eck <greck@postone.net> wrote: > Hi Richard, > > Good notes. Is this usually done for a given script/language? The Unicode Standard's section for Arabic talks about the mechanism of joining and how ZWJ and ZWNJ interfere with it. In part, this is because Arabic is the first normally cursive script. Similarly, the Devanagari section goes into some depth because ZWJ and ZWNJ interfere with conjunct formation. It also has to address the fallback situation where what is encoded as one akshara is rendered as two or more aksharas because the font can't handle it. The problem with Mongolian is that the user has to know when intervene. He can't just leave rendering to the font. > > Comments here: > 1.) I agree in philosophy. The things I assume may actually be the > things another person is just learning. So, better to lay out > everything. I think it would be good for us to have a discussion on > keyboards at the end of our talk. > There should be some > standardization on keyboard mappings. If standardization is not > possible, then at least a listing of the various mappings used on > simple keyboards. > A discussion on smart keyboards could follow from > there. A discussion of smart keyboards does not depend on where the keys are, but at most on what they are conceived as being for. Which languages were you thinking of? The first job is to sort out the exemplar characters. Does punctuation depend on the orientation? > 4.) I know of 4 > genuine over-ride situations where we need an FVS to actually > over-ride the default - U+1822, U+1828, U+182D (2x). I can provide > some simple rules and see if we can map them into the nomenclature > you suggest. Note that the overrides themselves are generally achieved by the shapes selected by variation selectors not being contextually modified. Thus, what you will see is the application of contextual rules. Override #1: Functionally, U+1822 I is proposed to have seven glyphs functionally. They are: gI0s, gI0i, gI0m, gI0f; gI1m; gI2m and gI3m. gI2m differs from gI0m in that is not affected by contextual changes until Stage E, when ligation occurs. gI1m is glyph with prefixed aleph, gI3m is double yodh. To simplify the rule, I need notation to define a 'class' as in the OT sense. For example: cMedialVowel = {gA0m, gA1m, gE*m, gI*m, gO*m, gU*m, gOE*m, gUE*m, gEEm, gTE*m, gTI*m, gTO*m, gTU*m, gTOE*m, gTUE*m, gSE*m, gSI*m, gSIYm, gSUE*m, gSUm, gMI*m, gAGA*m, gAGI*m, gAGHUm} (T = Todo, S=Sibe, M=Manchu, AG = Ali Gali, AGH = Ali Gali Half) Should gA1i be in the list? Should the medial vowels with a prefixed aleph (gA1m, gO1m, gU1m, gOE2m, gUE2m, gTE1m, gTI1m, gTO1m, gTU1m, gTOE1m, gTUE1m, gSI1m, gMI1m) I believe ZWJ should be an alternative to 'Mongolian_Letter'. The rule then becomes: gI0m > gI3m / cMedialVowel _ Fonts may subsequently include gI2m > gI0m to simplify the ligation rules. Override #2: Functionally, NA is proposed to have 7 glyphs, making 9 names if text is not to be wilfully corrupted: gN0s, gN0i, gNOm, gNOf; gN1i, gN1m, gN1f; gN2m; gN3m gN0s and gN0i should probably be the same glyph. Mongolian Baiti will misrender Unicode 8.0.0-compliant text and have gN2m with the same appearance as gN0m, make 8 glyphs functionally. cFinalVowel = {gA*f, gE*f, gI*f, gO*f, gU*f, gOE*f, gUE*f, gEEf, gTE*f, gTI*f, gTO*f, gTU*f, gTOE*f, gTUE*f, gSE*f, gSI*f, gSIYf, gSUE*f, gSU*f gMI*f, gAGA*f, gAGI*f, gAGHUf} cVowel = {cMedialVowel, cFinalVowel} I am assuming that no further rules in Stage D will affect true medial NA. Using Mongolian Baiti rules, we then have gN0m > gN1m / _ cVowel If we instead went with Martin Heijdra's observation about FSV1 being a toggle, we would have the parallel rules: gN0m > gN1m / _ cVowel gN1m > gN0m / _ cVowel These would also fit nicely with rules for the final NA: gN0f > gN1f / _ gMVS gN1f > gN0f / _ gMVS Again, there will be no further change in Stage D. We could also make gN2m a synonym of gN1f. Override #3 This looks like another attempt to get the Mongolian script deprecated on stability grounds. I will assume that the output of this rule will not be changed at Stage D. This is a big assumption. cDS = {gD*i, gD*m, gTD, gSD*i, gSD*m, gAGD, gS*i, gS*m} Completion of cBackVowel and cCons is an exercise to the reader. Should cBackVowel include gO1m and gU1m? The alephs mark the join of two words. gG0m > gG0m / cDS _ else # Bleed! gG0m > gG0m cVowel / cDS _ else # Bleed! gG0m > gG0m cVowel cVowel / cDS _ else # Bleed! gG0m > gG1m / _ cBackVowel else gG0m > gG1m / _ cCons cBackVowel I'd write it more elegantly for my compiler, but its notation's a bit close to the metal (though not as close as TTX!). Override #4: The logic's a bit creaky. Isn't the correct logic: If no preceding vowel then masculine else if last sexed vowel is masculine then masculine else feminine? Personally, I think sexing the word is unreasonable for rendering. Still, returning to your rules: cLetter = {...} # Exercise for reader. Allowing finals wouldn't hurt. cFemVowel = {gE**, gI**, gOE**, gUE**, gEE**} # Also allows finals # I assume the change will not be further modified by any rules at Stage D. gG0f > gG2f / cFemVowel 0{cLetter}19 _ else # This is a rule better handled by a morx table! >> Do we need to >> identify suffix rules for every language that might conceivably be >> written in the Mongolian script with separated suffixes? > 5.) Yes Who's handling French? > Can we add your paper to the NOTES section of Richard Ishida's > attachments at http://r12a.github.io/scripts/mongolian/variants ? Yes. Richard.
Received on Friday, 14 August 2015 23:32:04 UTC