W3C home > Mailing lists > Public > public-i18n-mongolian@w3.org > July to September 2015

RE: New Thread - FVS Assignment MisMatch

From: <jrmt@almas.co.jp>
Date: Thu, 6 Aug 2015 19:48:52 +0900
To: "'Greg Eck'" <greck@postone.net>, "'Richard Wordingham'" <richard.wordingham@ntlworld.com>, <public-i18n-mongolian@w3.org>
Message-ID: <000701d0d035$7d3abd30$77b03790$@almas.co.jp>
Hi Greg,

> This is a very interesting discussion.
> I am behind in my reading of the discussion, so forgive me if you have already dealt with this.
> I am concerned with such a high figure of 80% having multiple spelling possibilities.
> My dictionary+grammar does not show this.
> I wonder if you could pull 100 words from your dictionary and mark the ones with the possibility of multiple spellings.
> It would be good if you could include the text also, not just images, so that we can see the actual code-points behind the displayed forms.
> That would help clarify the exact issue.
> Are you talking about stems only OR inflected forms OR the both of them?

I am talking about the printed word to code point mapping possibility. 
That is mean the word displayed on the screen or printed on the paper, 
It is maybe stored in different encoding in Unicode (I am calling it spelling).
Maybe you already know what I am talking. Let me list 10 daily used word here with possibility.
If you really need 100 words, I will prepare for you from dictionary's particular page.

ᠠᠪᠤ ‍( father) - (U1820+U182A+1824), (U1820+U182A+1823), maybe there are (U1820+U182A+1825), (U1820+U182A+1826)
ᠡᠵᠢ ( mother ) - (U1821+U1835+1822), (U1821+U1835+1836)
ᠠᠬ᠎ᠠ ( brother) - (U1820+182C+180E+1820), (U1820+182C+180D+180E+1820), Even (U1820+182C+180E+1821), (U1820+182C+180D+180E+1821)
ᠳᠡᠭᠦᠦ (sister) - (U1833+U1821+U182D+U1826+U1826), (U1833+U1821+U182D+U1826+U1825), (U1833+U1821+U182D+U1825+U1825), (U1833+U1821+U182D+U185+U1826),
(U1832+U1821+U182D+U1826+U1826), (U1832+U1821+U182D+U1826+U1825), (U1832+U1821+U182D+U1825+U1825), (U1832+U1821+U182D+U185+U1826).
I have not include the final U1823, U1824 possibility 
and the U1820 possibility for this word.
ᠬᠦᠦ (son) - (U182C+U1826+U1826), (U182C+U1825+U1826), (U182C+U1826+U1825), (U182C+U1825+U1825), (U182D+U1826+U1826), (U182D+U1825+U1826), (U182D+U1826+U1825), (U182D+U1825+U1825).
ᠦᠬᠢᠨ (daughter) - (U1826+U182C+U1822+U1828), (U1825+U182C+U1822+U1828), (U1826+U182D+U1822+U1828), (U1825+U182D+U1822+U1828). 
I have not included the final N's U1820 and U1821 posibility.
ᠮᠢᠨᠤ ( my ) - (U182E+U1822+U1828+U1824), (U182E+U1822+U1828+U1823), (U182E+U1822+U1828+U1825), (U182E+U1822+U1828+U1826)
ᠲᠠᠨᠠᠢ ( his ) - (U1832+U1820+U1828+U1820+U1822), (U1832+U1821+U1828+U1821+U1822), (U1833+U1820+U1828+U1820+U1822), (U1833+U1821+U1828+U1821+U1822).
I have not include the wrong spelled possibility. The first two is all correct spelling, and have different meanings. 
ᠭᠡᠷ ( home ) - (U182D+U1821+U1837), (U182C+U1821+U1837)
ᠨᠤᠳᠤᠭ ( hometown ) - (U1828+U1824+U1833+U1824+U182D), (U1828+U1824+U1833+U1823+U182D), (U1828+U1823+U1833+U1824+U182D), (U1828+U1823+U1833+U1823+U182D), (U1828+U1824+U1832+U1824+U182D), (U1828+U1824+U1832+U1823+U182D), (U1828+U1823+U1832+U1824+U182D), (U1828+U1823+U1832+U1823+U182D)

I think the members who could not read Mongolian all have been puzzled by this.

In the list, Even the Mongolian people will confuse following word's correct spelling or encoding. 
ᠳᠡᠭᠦᠦ 
ᠦᠬᠢᠨ
ᠮᠢᠨᠤ
ᠲᠠᠨᠠᠢ
ᠨᠤᠳᠤᠭ

We need dictionary to confirm which is OE or UE, U or UE etc., 
when we need to encode or spell it correctly.

Regards,

Jirimutu
==========================================================
Almas Inc.
101-0021 601 Nitto-Bldg, 6-15-11, Soto-Kanda, Chiyoda-ku, Tokyo
E-Mail: jrmt@almas.co.jp   Mobile : 090-6174-6115
Phone : 03-5688-2081,   Fax : 03-5688-2082
http://www.almas.co.jp/   http://www.compiere-japan.com/
==========================================================




-----Original Message-----
From: Greg Eck [mailto:greck@postone.net] 
Sent: Thursday, August 6, 2015 3:57 PM
To: jrmt@almas.co.jp; 'Richard Wordingham'; public-i18n-mongolian@w3.org
Subject: RE: New Thread - FVS Assignment MisMatch

Jirimutu,

This is a very interesting discussion.
I am behind in my reading of the discussion, so forgive me if you have already dealt with this.
I am concerned with such a high figure of 80% having multiple spelling possibilities.
My dictionary+grammar does not show this.
I wonder if you could pull 100 words from your dictionary and mark the ones with the possibility of multiple spellings.
It would be good if you could include the text also, not just images, so that we can see the actual code-points behind the displayed forms.
That would help clarify the exact issue.
Are you talking about stems only OR inflected forms OR the both of them?

Thanks much,
Greg


-----Original Message-----
From: jrmt@almas.co.jp [mailto:jrmt@almas.co.jp]
Sent: Monday, August 3, 2015 4:52 AM
To: 'Richard Wordingham' <richard.wordingham@ntlworld.com>; public-i18n-mongolian@w3.org
Subject: RE: New Thread - FVS Assignment MisMatch

Dear Mr. Richard,

> No!  The Unicode editing committee tried to chose a form that was 
> unique
to a particular character.  
> That does not make it the appropriate default isolate.  
We have done that in the Unicode Encoding Chart. The U1800 exactly selected the different display form of each character.
What is the problem ? what I am saying here is we will follow the Unicode Encoding chart U1800.pdf to select the default isolate variant form.

> Remember, the basic character charts are not normative; they merely 
> serve
to tell the reader which character has a particular code.  
> This can fail spectacularly when characters are distinguished by their
sound rather than their shapes.  
> (There are also a few Korean Chinese compatibility characters that are
principally distinguished by sound.)
I have raised this problem in 1999's, when the Mongolian Proposal prepare stage. 
But the WG2 lead us to come to current version of Mongolian Unicode chart. 
I remember that the Mongolian each character have different shape in Unicode basic character chart.
But do you know, how many undistinguishable word exactly in Mongolian ?
According to our approximately statistic, there are almost 80% of the word have more than two spelling in current Mongolian Unicode encoding.
We have no other selection, we have to use current version of the Unicode Mongolian.

> A similar example is the pairs U+0061 LATIN SMALL LETTER A and
> U+0251 LATIN SMALL LETTER ALPHA and U+0067 LATIN SMALL LETTER G and
> U+0261 LATIN SMALL LETTER SCRIPT G.  A Unicode-compliant font for a
> children's book may render U+0061 and U+0067 like the reference glyphs 
> for
U+0251 and U+0261; it may even render each pair identically.
It is not proper to the Mongolian. It is not the proper objection points to my opinion on Mongolian.
In the Mongolian, there are only use different font, no different characters exist in first year pupil text book. 

> I trust the following (points 2 to 5) are guiding principles for 
> dealing
with overlooked or definitely unclear combinations.  
> Unicode might not take kindly to changing the existing assignments. 
Thanks to your understanding. Maybe other person have some other opinion on the points 2-5.
I would like to hear from all members. 
It is Ok to me that the principle of the Mongolian Variant form mapping might be quietly different with my list.
But I am hoping that there should be one this kind of principle. 
Do you know, we are facing one big problem in Inner Mongolia that we have to change some current existing Mongolian grammar in primary to secondary school education system, Because of the some Unicode Mongolian Variant mapping definition. 
Do you agree that because of the Unicode Mongolian Encoding rule definition, the users have to change their learned grammar to fit the Unicode rule ?
Or Unicode rule need to fit with the majority people's existing grammar knowledge ? 
If you need detailed information on it, I can prepare it in the following discussion.

> How much of the problem is due to unclear determination of whether the
starting point is the isolated, initial, medial or final form?
> There may conceivably be an error in the 'joining type' of MVS and NNBSP.

> As far as the variation selectors are concerned, the Unicode standard
rules that the preceding letter is final or isolated, 
> and the following letter is initial or isolated.  Apart from any 
> issues
there, the definitions should be clear. 
> I looked and saw no difference for the Mongolian script between
StandardizedVariants.html in Versions 4.00 and 8.00 of Unicode.
StandardizedVariants.html is the only small part of the mapping rule. 
And the NP in https://r12a.github.io/scripts/mongolian/variants not covered whole possibility yet. 
This is why we have a discussion here. 
The MVS and NNBSP is the only starting point. But it was the most problematic points in Mongolian before.

For example, I have amending points on the first letter U1820-A. I am not sure all of the member agree me, but exactly I had the requirements from users. 
I will send the U1800-A related inputs in separate mail.

Thanks and Regards,

Jirimutu
==========================================================
Almas Inc.
101-0021 601 Nitto-Bldg, 6-15-11, Soto-Kanda, Chiyoda-ku, Tokyo
E-Mail: jrmt@almas.co.jp   Mobile : 090-6174-6115 Phone : 03-5688-2081,   Fax : 03-5688-2082 http://www.almas.co.jp/   http://www.compiere-japan.com/ ==========================================================




-----Original Message-----
From: Richard Wordingham [mailto:richard.wordingham@ntlworld.com]
Sent: Monday, August 3, 2015 3:14 AM
To: public-i18n-mongolian@w3.org
Subject: Re: New Thread - FVS Assignment MisMatch

On Mon, 3 Aug 2015 00:56:29 +0900
<jrmt@almas.co.jp> wrote:


> For example, following is my personal consideration. 
> 
> 1. We select the most commonly used isolate, initial, medial, final 
> form of the character as the default Variant form (No need FVS1-3).
> 
>    The variant form listed on the primary school first year pupil's 
> text book comes first (is the default form).

>    The default isolate form have to be same with the Unicode encoding 
> chart.

No!  The Unicode editing committee tried to chose a form that was unique to a particular character.  That does not make it the appropriate default isolate.  Remember, the basic character charts are not normative; they merely serve to tell the reader which character has a particular code.  This can fail spectacularly when characters are distinguished by their sound rather than their shapes.  (There are also a few Korean Chinese compatibility characters that are principally distinguished by sound.)

A similar example is the pairs U+0061 LATIN SMALL LETTER A and
U+0251 LATIN SMALL LETTER ALPHA and U+0067 LATIN SMALL LETTER G and
U+0261 LATIN SMALL LETTER SCRIPT G.  A Unicode-compliant font for a
children's book may render U+0061 and U+0067 like the reference glyphs for
U+0251 and U+0261; it may even render each pair identically.

I trust the following (points 2 to 5) are guiding principles for dealing with overlooked or definitely unclear combinations.  Unicode might not take kindly to changing the existing assignments. 

> 2. To exactly specify the second regularly used variant form, we will 
> use FVS1.
<snip>

> Because of the previous existing Mongolian Variant formatting rule 
> have not clearly, uniquely defined the form selection.

How much of the problem is due to unclear determination of whether the starting point is the isolated, initial, medial or final form?
There may conceivably be an error in the 'joining type' of MVS and NNBSP.
As far as the variation selectors are concerned, the Unicode standard rules that the preceding letter is final or isolated, and the following letter is initial or isolated.  Apart from any issues there, the definitions should be clear. I looked and saw no difference for the Mongolian script between StandardizedVariants.html in Versions
4.00 and 8.00 of Unicode.

Richard.
Received on Thursday, 6 August 2015 10:49:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:07:04 UTC