RE: NNBSP Impact from Greg Eck on 2015-07-21 (public-i18n-mongolian@w3.org from July to September 2015)

From: Greg Eck <greck@postone.net>
Date: Tue, 21 Jul 2015 14:59:58 +0000
To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
Message-ID: <BN3PR10MB0321A448B4A1534E26A36714AF840@BN3PR10MB0321.namprd10.prod.outlook.com>
Richard has suggested that I read through TR44 regarding this area of modifying the NNBSP so that it does not break a Mongolian word.
Comments on TR29/TR44 are welcome.
I am going to be out for a few days studying through this further.
Greg
>>>>>
the basic link you need is
http://www.unicode.org/reports/tr44/


this contains descriptions of most of the properties
http://www.unicode.org/reports/tr44/#Word_Break

>>>>>


-----Original Message-----
From: Greg Eck 
Sent: Saturday, July 18, 2015 7:31 PM
To: 'public-i18n-mongolian@w3.org' <public-i18n-mongolian@w3.org>
Subject: RE: NNBSP Impact

Hi Andrew (West), 

Assuming that we are not moving away from NNBSP, but instead refining it, could we do something like this ... ?

Going from TR#29 http://unicode.org/reports/tr29/#AnyWB

1.) We define a new Word_Break Property Value "Mongolian" similar to the value "Katakana" (see attached).
2.) The Property Value of Mongolian is defined as 
 (Script=Mongolian<MNG> OR Script=Todo<TOD> OR Script=Sibe<SIB> OR Script=Manchu<MCH>) AND (U+202F-NNBSP)
3.) ALetter is further refined to carry the additional logic of "and Word_Break is not equal to Mongolian" (as attached)
4.) Word_Boundary Rules would now include the following
 WB13d   Mongolian x Mongolian
5.) Reassign the Word_Break value of U+202F from XX to Mongolian

There are probably holes in the above logic, but is this somewhat of the idea that you were referring to in mentioning a change to the Word_Break value of U+202F?

Would this be carried out in the individual rendering machines such as MS Universal Shaping Engine/Harfbuzz?

Greg



-----Original Message-----
From: Andrew West [mailto:andrewcwest@gmail.com]
Sent: Friday, July 17, 2015 7:14 PM
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc: jrmt@almas.co.jp; Greg Eck <greck@postone.net>; public-i18n-mongolian@w3.org
Subject: Re: NNBSP Impact

On 17 July 2015 at 10:08, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote:
>
> On 2015/07/17 16:05, jrmt@almas.co.jp wrote:
>
> As the text in the Unicode standard explains, ZWNJ and ZWJ are used 
> for this also in Arabic/Persian. They are also used in Persian for 
> separating suffixes (Arabic doesn't have separable suffixes).
>
>> For this reason, we should not use ZWNJ for Mongolian Suffix Separator.
>
> There may be good reasons for not using ZWNJ as a Mongolian Suffix 
> Separator, but the above alone are not convincing (not to me, and most 
> probably also not to the Unicode Technical Committee).

Using ZWNJ in place of NNBSP is definitely a not an option in my opinion, as ZWNJ affects the joining behaviour of preceding and following Mongolian letters in one particular way (selects non-joining forms), but NNBSP affects the joining behaviour of preceding and following Mongolian letters in a different way (selects non-joining form for preceding letter but may select an initial, medial or final form of the following letter depending on the suffix).  It is impossible for one character to inform two different shaping behaviours for the same following letter.

Personally I am not in favour of replacing NNBSP with a new character at this late stage in the game, and I think that it will be a very hard sell to the UTC, who I suspect will be very concerned about destabilizing existing Mongolian data if a new character is introduced.  As the issues raised in this discussion about NNBSP do not involve shaping at the rendering level, but are problems related to correctly determining word boundaries by software that processes Mongolian data, in my opinion the best solution would be to modify the word break property of NNBSP (which is currently "XX").

Andrew
Received on Tuesday, 21 July 2015 15:00:38 UTC