NNBSP Impact from Greg Eck on 2015-07-29 (public-i18n-mongolian@w3.org from July to September 2015)

From: Greg Eck <greck@postone.net>
Date: Wed, 29 Jul 2015 03:26:45 +0000
To: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>
Message-ID: <BN3PR10MB03218719BB3EB2DEA85BCCBDAF8C0@BN3PR10MB0321.namprd10.prod.outlook.com>

Here are my current thoughts on the NNBSP issue ...

Implementations (both fonts and utilities) dealing with the Mongolian script and specifically U+1800-18AA have used the following control characters in the following manners:

1.) NNBSP - sole purpose is to separate the Stem+Suffix and the Suffix+Suffix context (with space) while at the same time keeping the given contexts connected as a word.

2.) MVS - sole purpose is to separate the Stem+Orkhitz_A/E context (with space) while at the same time keeping the given contexts connected as a word.

3.) FVS1/FVS2/FVS3 - designed to tag the previous character in such a way that the OT rulings can modify the preceding character.

4.) ZWJ/ZWNJ - provide simulated environments for stand-alone isolate/initial/medial/final contexts.

5.) ZWJ - prevent/allow OT ligaturing; break otherwise expected OT rulings.

- Additions/amendments are welcome

Observations from our current discussions:

1.) NNBSP gives the following problems in the current Mongolian script Utilities functionality

- Considered to be a space in the case of most programming languages and embedded routines and therefore gives undesired results in parsing processes

- Breaks the word as seen in word counting, word jumping, sorting, parsing

2.) The character properties of the MVS are probably identical in all ways to the "desired_NNBSP". However, the idea of adding NNBSP functionality into the MVS is infeasible as there are identical contexts that the MVS and the "desired_NNBSP" need to distinguish.

3.) The current functionality of the ZWNJ is not compatible with the desired functionality of the NNBSP in Mongolian , as the ZWNJ affects the joining behaviour of preceding and following Mongolian letters in one particular way (selects non-joining forms), but NNBSP affects the joining behaviour of preceding and following Mongolian letters in a different way (selects non-joining form for preceding letter but may select an initial, medial or final form of the following letter depending on the suffix) - Andrew West.

4.) NNBSP problem of breaking words might be fixed by defining a new Word_Break Property Value "Mongolian" similar to the value "Katakana"

5.) NNBSP problem of being classified as a space/white_space cannot be solved. It seems apparent that the use of NNBSP in other languages as a bona fide space makes it unreasonable to request that it be reclassified as a non_space.

6.) The MVS went through several iterations of design/re-design before the current set of character properties were stabilized. As there were unknowns in the stabilization of the MVS, it is not known now whether more problems will creep in with future upper-level processing using the "a_modified_NNBSP". With the history of the MVS refinement in mind, the idea of modifying the NNBSP over a possible lengthy period of testing and refinement is problematic.

7.) The current state of documents circulating which use the NNBSP now is unknown. My guess is that it is a low figure.

8.) Both Badral and Jirumutu have mentioned a code base dealing with the current problematic NNBSP implementation. I have a small code base dealing with the NNBSP now myself. To update my code base to a new character is not a problem. To leave my code base as it is with the NNBSP does not seem to be a present problem. But then, my current set of utilities is small also. Could I ask Jirimutu and Badral to give more detail as to how specifically the new_NNBSP character would help in their upper-level processing? How specifically, the old_NNBSP has broken their systems? Others?

9.) Even considering the possible implementation of a new Word Break property as mentioned above - that it would indeed keep a suffix attached word intact - there is still the unsolved problem of the NNBSP being a bona fide space - a property that is unchangeable.

- Additions/amendments are welcome

I do not see a resolution to the problems addressed above with the implementation of the U+202F-NNBSP being used as a space-separator-connector for the Mongolian contexts of STEM+Suffix and Suffix+Suffix. Therefore, I am personally resigned to the call for a new Unicode character designation of Mongolian Suffix Separator at U+180F.

Greg

Received on Wednesday, 29 July 2015 03:27:16 UTC