RE: MVS Deficiency & Proposed Solution

Hi Jargal,

Thanks for the comments.
I am in Beijing now with Ou Orlog, an Inner Mongolian colleague who will be attending the WG2 meetings in San Jose.
We have completed our presentation on the MVS issue.

A few things to note – I made several mistakes in my notes on the MVS A/E article earlier …

1.)    I called the A/E form an orkitz earlier. Orlog has corrected me to say that this separate glyph following the MVS is actually called the “tsatslag”. Further discussions will refer to the glyph as Tsatslag_A and Tsatslag_E.

2.)    I had stated that the meaning of BAG-MVS-A was “team” but it is actually “small”. The meaning of BAG-MVS-A-CHUD is “small ones” using the adjective “small” in a substantive sense. The extended meaning then is “children”.

In looking at the matter in greater detail over the past several days, we are convinced that new code-points are probably the best solution. Given that this is the case, then it makes the MVS superfluous. The MVS has only one task in life - to separate the Tsatslag_A / Tsatslag_E from the stem with a small gap of space. The MVS itself also allows the OT rulings the context to “mark” the place where glyphs need to transform on either side of the space.
These are the options we have considered so far …

1.)    We stay with the current MVS design and try to fix the problem as described earlier. Let’s say that we are working only with the Tsatslag_A and specifically the BAG-MVS-A stem. Now, we add OT rulings that will cause BAG-MVS-A-CHUD to shape correctly with the Tsatslag_A transforming to the standard medial A when the CHUD suffix is added. Then when the CHUD suffix is deleted, the Tsatslag_A reappears as desired. The problem is that the MVS carries its space internally by its definition. Therefore all through the above process, the MVS space was there. This solution is not viable given the situation that there is a suffix attached to the Tsatslag.

2.)    Let’s say that the MVS is not tenable for the current display problem. Let’s say that we do not use the MVS and try to use a tsatslag with space included in the glyph and the new OT rulings. We type in the BAG fine with no display problems. We have a new keystroke to type in the new variant (with space included in the glyph). But the new keystroke is still emitting only the U+1820. It has lost its Tsatslag-marker, the MVS. There is no way to communicate to the Shaping Engines that this new keystroke is any different from the old keystroke. Both the keystroke for the regular final A AND the keystroke for the Tsatslag_A emit the same U+1820 sequence.

3.)    If indeed we need some sort of Tsatslag-marker like the MVS, then we need to either redefine the MVS to have no space _OR_ we need to build space into two new code-points (eg. U+181E/U+181F). We have built a proof-of-concept font (implementing the Tsatslag_A only) which seems to work fine without the MVS. The U+181E glyph includes the space that the MVS provides in our current implementation. The unique code-point is in itself the MVS-marker needed to trigger the OT substitution rules. Of course, the question may be asked as to whether we are creating a new vowel in the assignment of an entire code-point to a new glyph. The answer is no. The history of the tsatslag shows that it was part of the “A” phoneme from the beginning. It is just that there are exactly identical contexts where the user must determine which form he/she wants to type – the rightward-sweeping final A or the Tsatslag_A  sweeping_and_disconnected to the left. There is precedence for this same situation in other languages. The English letter “A”, for example, is assigned two code-points. One is for the upper-case and the other is for the lower-case form. Both are necessary for the user to have complete control of which form he/she wishes to type. There is no way that an automated system can choose for the user whether he/she wants a lower-case “a” or an upper-case “A”. In our case, the user typing in the Mongolian words “month” and “moon” needs to be able to determine which word he/she wants. “Moon” is SARA (rightward-sweeping final A). “Month” is SARA (leftward-sweeping A with space between the SAR and the final A).
My recommendation is that we deprecate the MVS (U+180E) and add two new code-points U+181E for the Tsatslag_A and U+181F for the Tsatslag_E. It is a sweeping change, I agree, but we have thousands of words that do not form correctly without a solution to this problem. When we get to heavy implementation of corpus analysis and tagging, sorting and searching, etc. this issue will only be more relevant.

Comments and other solutions are very welcome and needed. If we make this recommendation, we should have some rigorous discussion on it.

Greg

>>>>>
Sent: Monday, September 19, 2016 3:46 PM
Subject: Re: MVS Deficiency & Proposed Solution

Hi all,


I think the problem Greg has formulated is really important and cannot be seen just as a matter of typing MVS in one case and omitting it in another. We are dealing here with the base or root wordform which should be the same in all cases to optimize searching for example.

Is MVS really necessary?

Best regards,
Jargal
>>>>>

Received on Tuesday, 20 September 2016 14:24:55 UTC