W3C home > Mailing lists > Public > public-i18n-mongolian@w3.org > July to September 2016

Re: MVS Deficiency & Proposed Solution

From: Andrew West <andrewcwest@gmail.com>
Date: Tue, 20 Sep 2016 16:44:42 +0100
Message-ID: <CALgEMhxUg45npGKbNeCAdh_7+3jJeDxmM9dawR29P9eWf_nBEQ@mail.gmail.com>
To: Greg Eck <greck@postone.net>
Cc: Jargal Badagarov <bjargal@mail.ru>, "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
Hi Greg,

This is certainly a worrying situation which needs to be fixed, but I
am not convinced that the answer is to deprecate the MVS and encode
two new characters.

You state that "The problem is that the MVS carries its space
internally by its definition. Therefore all through the above process,
the MVS space was there."  In fact, by definition MVS is a format
character (general category = Cf) not a space character (general
category = Zs), so it does not have an inherent space.  Andrew Glass
will know the answer better than me, but I do not see any reason in
principle why the OpenType feature cannot be implemented in the font
so that MVS only results in a space when followed by a/e alone, and
behaves as if it were not there when followed by a/e + suffix.

I believe that fixing the problem in the fonts will be a better, and
far less disruptive solution, than encoding two new characters.

Andrew





On 20 September 2016 at 15:24, Greg Eck <greck@postone.net> wrote:
> Hi Jargal,
>
>
>
> Thanks for the comments.
>
> I am in Beijing now with Ou Orlog, an Inner Mongolian colleague who will be
> attending the WG2 meetings in San Jose.
>
> We have completed our presentation on the MVS issue.
>
>
>
> A few things to note – I made several mistakes in my notes on the MVS A/E
> article earlier …
>
> 1.)    I called the A/E form an orkitz earlier. Orlog has corrected me to
> say that this separate glyph following the MVS is actually called the
> “tsatslag”. Further discussions will refer to the glyph as Tsatslag_A and
> Tsatslag_E.
>
> 2.)    I had stated that the meaning of BAG-MVS-A was “team” but it is
> actually “small”. The meaning of BAG-MVS-A-CHUD is “small ones” using the
> adjective “small” in a substantive sense. The extended meaning then is
> “children”.
>
>
>
> In looking at the matter in greater detail over the past several days, we
> are convinced that new code-points are probably the best solution. Given
> that this is the case, then it makes the MVS superfluous. The MVS has only
> one task in life - to separate the Tsatslag_A / Tsatslag_E from the stem
> with a small gap of space. The MVS itself also allows the OT rulings the
> context to “mark” the place where glyphs need to transform on either side of
> the space.
>
> These are the options we have considered so far …
>
> 1.)    We stay with the current MVS design and try to fix the problem as
> described earlier. Let’s say that we are working only with the Tsatslag_A
> and specifically the BAG-MVS-A stem. Now, we add OT rulings that will cause
> BAG-MVS-A-CHUD to shape correctly with the Tsatslag_A transforming to the
> standard medial A when the CHUD suffix is added. Then when the CHUD suffix
> is deleted, the Tsatslag_A reappears as desired. The problem is that the MVS
> carries its space internally by its definition. Therefore all through the
> above process, the MVS space was there. This solution is not viable given
> the situation that there is a suffix attached to the Tsatslag.
>
> 2.)    Let’s say that the MVS is not tenable for the current display
> problem. Let’s say that we do not use the MVS and try to use a tsatslag with
> space included in the glyph and the new OT rulings. We type in the BAG fine
> with no display problems. We have a new keystroke to type in the new variant
> (with space included in the glyph). But the new keystroke is still emitting
> only the U+1820. It has lost its Tsatslag-marker, the MVS. There is no way
> to communicate to the Shaping Engines that this new keystroke is any
> different from the old keystroke. Both the keystroke for the regular final A
> AND the keystroke for the Tsatslag_A emit the same U+1820 sequence.
>
> 3.)    If indeed we need some sort of Tsatslag-marker like the MVS, then we
> need to either redefine the MVS to have no space _OR_ we need to build space
> into two new code-points (eg. U+181E/U+181F). We have built a
> proof-of-concept font (implementing the Tsatslag_A only) which seems to work
> fine without the MVS. The U+181E glyph includes the space that the MVS
> provides in our current implementation. The unique code-point is in itself
> the MVS-marker needed to trigger the OT substitution rules. Of course, the
> question may be asked as to whether we are creating a new vowel in the
> assignment of an entire code-point to a new glyph. The answer is no. The
> history of the tsatslag shows that it was part of the “A” phoneme from the
> beginning. It is just that there are exactly identical contexts where the
> user must determine which form he/she wants to type – the rightward-sweeping
> final A or the Tsatslag_A  sweeping_and_disconnected to the left. There is
> precedence for this same situation in other languages. The English letter
> “A”, for example, is assigned two code-points. One is for the upper-case and
> the other is for the lower-case form. Both are necessary for the user to
> have complete control of which form he/she wishes to type. There is no way
> that an automated system can choose for the user whether he/she wants a
> lower-case “a” or an upper-case “A”. In our case, the user typing in the
> Mongolian words “month” and “moon” needs to be able to determine which word
> he/she wants. “Moon” is SARA (rightward-sweeping final A). “Month” is SARA
> (leftward-sweeping A with space between the SAR and the final A).
>
> My recommendation is that we deprecate the MVS (U+180E) and add two new
> code-points U+181E for the Tsatslag_A and U+181F for the Tsatslag_E. It is a
> sweeping change, I agree, but we have thousands of words that do not form
> correctly without a solution to this problem. When we get to heavy
> implementation of corpus analysis and tagging, sorting and searching, etc.
> this issue will only be more relevant.
>
>
>
> Comments and other solutions are very welcome and needed. If we make this
> recommendation, we should have some rigorous discussion on it.
>
>
>
> Greg
>
>
>
>>>>>>
>
> Sent: Monday, September 19, 2016 3:46 PM
> Subject: Re: MVS Deficiency & Proposed Solution
>
>
>
> Hi all,
>
>
>
>
>
> I think the problem Greg has formulated is really important and cannot be
> seen just as a matter of typing MVS in one case and omitting it in another.
> We are dealing here with the base or root wordform which should be the same
> in all cases to optimize searching for example.
>
>
>
> Is MVS really necessary?
>
>
>
> Best regards,
>
> Jargal
>
>>>>>>
>
>
Received on Tuesday, 20 September 2016 15:45:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:07:52 UTC