W3C home > Mailing lists > Public > public-i18n-mongolian@w3.org > July to September 2015

Re: FVS for NA

From: Badral S. <badral@bolorsoft.com>
Date: Fri, 07 Aug 2015 23:56:55 +0200
Message-ID: <55C529A7.8090504@bolorsoft.com>
To: public-i18n-mongolian@w3.org
Hi,
I just finished to read all emails from this list.
Martin: Thanks for detailed info.
Greg: Could you tell us your implementation strategy and status for 
Mongolian Baiti font? I understand now that Almas's, FangZhen's and 
Menksoft's fonts have same mapping and implementation strategy.
Kamal: Can you tell us something about your font implementation. It 
seems very interesting to me.
I will tell something about Mongolianscript font. It is the first ever 
unicode font for Mongolian script, which created in 2000 by Erdenechimeg 
et al., also oldest one. Between 2003 and 2005, I contributed to put 
some Opentype rules into it and then unfortunately it's development 
stalled because we discovered Mongolian Baiti font and recommended our 
users to use this font. Since 2012, re-activated the development of 
Mongolianscript font by Bolorsoft, because we received scores of 
complaints and we detected that the Mongolian Baiti still might not be 
in a position to use flawless. I could define our implementation 
strategy as Top-down approach, which means trying to solve problems 
first by OT rules and if it's insolvable by it, then FVS played a role, 
to simplify text inputs for users. That's why Mongolianscript have few 
variant selectors.
Our font implementation is incomplete and the status is ca. 85%.
I think that Mongolian Baiti has Bottom-Up principle, which defines 
first FVSs and then OT rules. Both approaches could have advantages and 
disadvantages.
I agree that the Mongolian is most difficult script in Unicode and the 
Mongolian encoding model is worst one. So, can we then cover all 
mongolian issues without OT rules? I don't think so, due to strict 
grammatical gender and vowel harmony rules etc.
I am skeptical about folding. It could make users more chaotic and this 
in turn reduces the quality of the data and even linguistic processing 
more difficult. I hate ambiguity and I expect only one term by searching 
a string. For instance, I want to take just Mongol by searching the term 
of Mongol, not Mungul or Mongul, Mungol even though they seems with 
Mongolian font identical. Web and digital data is always growing and 
some time later how can I reach my goal, if I need only a correct 
written term? I think that this problem should be solved by smart 
keyboard and spell-checker.
Of course, folding makes big sense for letter cases.
As Jirimutu mentioned, it would be very helpful if we could define a 
general rule for FVSs, something like FVSs are used first to switch 
gender case and second to toggle consonant (ends a closed syllable) 
cases in our case NA and third ... etc.
Jirimutu, thanks for your effort and for discussion list document.

Badral

On 07.08.2015 15:32, Martin Heijdra wrote:
> Richard, Jirimutu:
>
> 1. The original document which specified that was the TR 170 document agreed upon by China and Mongolia in 1999.
>
> I actually started with learning about Mongolian encoding when I asked Unicode how the NA would be treated, as a toggle, or whether a FVS would be needed either before a vowel or consonant, thus whether it would be denoted a specific glyph.
>
> 2. The above document (which I could send or you could find on the net, but the discussion is way beyond that now) is completely insufficient for full processing, even if it was useful as the first step; and there were some slight discrepancies with the table as published in Menggu wen bian ma, although they were supposed to be 100% the same. I found just small mistakes in both versions (not different opinions, most probably just mistakes--but it made it difficult to declare one version the end-all.)
>
> I am not sure what access most people have to the Menggu wen bian ma book. I *think* Andrew only put the tables in that book online, the meat of the book. In addition to the table, and some documents treating a few background issues, there are some documents (in Chinese) about the exchange of Unicode and the that combined Chinese-Mongolian group, discussing e.g. whether a Mongolian space or a NNBSP is needed. Of historical interest only (but I have nothing against historical interest.)
>
> 3. The real tables needed to decide behavior in running text were provided to me by Microsoft, and originated with long documents, for each of the 4 scripts, made under auspices of prof Quejingzhabu. I am not sure who provided the input, whether it was him singly or a group effort; certainly largely Chinese. Since I received at one time updates, also that document was not final apparently; and changes could be in the order of those -i diphthongs, so not minor. I have no doubt that an updated final, widely agreed-upon version of this document is wat is needed to be published by Unicode (it was not specific to any particular implementation such as OpenType, thus even the more useful), but its proprietary status or not was never too clear to me. From the answer by Jirimutu I gather that some version is available to this group (the 7th?), but that there are later versions that are not (yet). I know it is apparently difficult to get that, but I can't see how a shared understanding can be created without it. You don't want to have a situation whereby you on behalf of Unicode publish one version, and everyone in Inner Mongolia (where the major part of users and publications of traditional Mongolian are) follow another. There would be nothing standard about that. On the other hand, China can't declare as standard anything which is not published either.
>
> 4. There are similar documents for Manchu, Todo and Sibe. The latter two are rather trivial, the first is not. The relationship was largely so that the definitions of FVSs were for the script, NOT only for its Mongolian-language implementation (thus, if the same "letter" would follow different rules in different languages, they were separated; but conversely, if their behavior did not conflict (even if certain variants might be specific to one language),  letters were unified.) Thus, occasionally behavior in Manchu could/should influence the Mongolian discussion. However, since Manchu is a dead language, I for one would be perfectly happen to favor Mongolian as the guideline to set defaults. But the ai/ay/ayi issue definitely also involves Manchu. (BTW, I am also surprised nobody has pointed out words like naiman yet.) Uyghur, from which Mongolian is derived, is not specified as a language covered by this script. In practice Daur is. The registers for Sanskrit- and Tibetan-derived scripts create additional problems, but also there I for one would treat those only secondarily, not influencing the choices of FVSs.
>
> 5. Richard: you refer to "aleph" and referred once to the Daniels & Bright  book, I think. That is a particularly useless piece of writing for Mongolian: it treats the derivation of the script historically before it actually settled down, while not at all treating current behavior (with that purpose, it may be useful, and the author is famous and competent. But it should in my view never have been published in this book, since all the other entries were about current scripts and how they work.) A comparison is if the Latin script would only be treated as far as it derived from Egyptian, and  only use it as a version of Egyptian. Any other English or German introduction to the language or script is superior.
>
>
> Martin
>
> -----Original Message-----
> From: Richard Wordingham [mailto:richard.wordingham@ntlworld.com]
> Sent: Thursday, August 06, 2015 7:14 PM
> To: public-i18n-mongolian@w3.org
> Subject: Re: FVS for NA
>
> On Thu, 6 Aug 2015 13:15:10 +0000
> Martin Heijdra <mheijdra@Princeton.EDU> wrote:
>
>> Thus, the following message refers to the N. The FVS1 there always was
>> defined as a *toggle*, always meaning "the first exception to the
>> rule": thus, in running text, NA+FVS1 did NOT refer to a particular
>> glyph, and any such assumption so is wrong (unless you completely
>> change the rules). The NA has different default versions, with or
>> without dot, before consonants and vowels; the FVS1 chooses the
>> opposite. Thus, in running text, unlike metatext, there is no ONE
>> definition of NA+FVS1: it depends on context. At least, that was the
>> model chosen. Thus it is not even true to say, what is the case in
>> most cases, that the FVS defines a glyph, but that whether the FVS is
>> needed in running text depends on the context, and I think that is the
>> assumption of many: the very shape of NA+FVS1 depends on the context.
> Where is this toggling behaviour by FVS1 explicitly specified?  I have find no trace of such a specification.
>
> Additionally, where have we recorded the rules for dotting NA?  For example, it is not clear from what Martin said that an aleph as the initial but not the only part of a vowel symbol counts as a consonant.
>
> Richard.
>
>


-- 
Badral Sanlig, Software architect
www.bolorsoft.com | www.badral.net
Bolorsoft LLC, Selbe Khotkhon 40/4 D2, District 11, Ulaanbaatar
Received on Friday, 7 August 2015 21:57:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:07:05 UTC