Re: NNBSP Impact from Badral S. on 2015-07-29 (public-i18n-mongolian@w3.org from July to September 2015)

From: Badral S. <badral@bolorsoft.com>
Date: Wed, 29 Jul 2015 12:04:52 +0200
To: public-i18n-mongolian@w3.org
Message-ID: <55B8A544.3080809@bolorsoft.com>
Hi Andrew,
Any incorrect classified breaker characters (for us NNBSP, MVS- between 
unicode 5.0 and 6.3) corrupt GSUB rules of OTF for Mongolian. The 
suffixes have to be started by "medial" variant after NNBSP. This rule 
is already implemented in every Mongolian Fonts. But every suffix starts 
with "initial" variant, if NNBSP belongs to spaces or word boundary 
class. Mongolian language is very agglutinative, so almost all words are 
illustrated incorrect, if NNBSP belongs to word boundary class. The 
problem exists on Openoffice, Libreoffice, Google Chrome, Safari and 
Opera as my test.

Badral

On 29.07.2015 10:41, Andrew West wrote:
> Hi Greg,
>
> On 29 July 2015 at 04:26, Greg Eck <greck@postone.net> wrote:
>> 1.)     NNBSP gives the following problems in the current Mongolian script
>> Utilities functionality
>>
>> -          Considered to be a space in the case of most programming
>> languages and embedded routines and therefore gives undesired results in
>> parsing processes
> Could you explain (with concrete examples if possible) exactly what
> undesirable results result from NNBSP being a space character?
>
>> -          Breaks the word as seen in word counting, word jumping, sorting,
>> parsing
> I can understand the issues with word selection, word counting and
> word navigation, which I have verified exist in some software, notably
> Word (but not all software -- Notepad and BabelPad both behave as
> desired), but I am not sure what specific issue "parsing" refers to,
> and I would like to see an example of incorrect sorting behaviour that
> I can test using the Unicode Collation Algorithm (UCA).
>
> If you are going to make a proposal for a new character you will need
> to give specific examples of incorrect behaviour, and explain why this
> incorrect behaviour cannot be remedied by tweaking Unicode properties
> or the UCA.  On the Unicode internal (Unicore) mailing list Asmus
> Freytag suggested that the word break property of NNBSP could be
> changed so that by default there would be no word break when the
> character before and after it belonged to the same category (e.g. both
> letters, as is the case for Mongolian).  Making this change should
> solve the word boundary issue, as early as Unicode 9.0 next June if
> someone makes a proposal to the UTC soon, but encoding a new character
> will take at least two years, possibly much longer if there is
> opposition from ISO national bodies.
>
> It may take a while before Word catches up with changes to the word
> break property, but it would take even longer for Word to support a
> new character.  In my opinion, the main advantage of property change
> over encoding a new character is that the property change will fix
> existing Mongolian text, whereas the new character will have no effect
> on existing Mongolian text, and users will still complain that word
> selection etc. does not work for pre-new-character Mongolian text (and
> users will not even start to use the new character until it is not
> displayed as an empty box on their system, and it produces the
> expected shaping behaviour, which will probably be several years after
> the several years to get it encoded).
>
> A further problem with encoding a new character is that when it is
> eventually supported by fonts and rendering systems, Mongolian text
> with NNBSP and Mongolian text with the new character will look the
> same to end users, with the result that users will start to complain
> that internet searches and local find/replace functions do not work
> correctly for Mongolian because searching for a Mongolian word with
> the new character will not match the same word with NNBSP and vice
> versa.  And this problem will never go away, because no-one is going
> to magically change existing Mongolian data, and input methods and
> users will continue to use NNBSP in place of the new character for
> years to come -- why not? they both look the same and produce the same
> visual result.
>
> All in all, I firmly believe that encoding a new character will create
> more and worse problems than it solves.
>
> Andrew
>


-- 
Badral Sanlig, Software architect
www.bolorsoft.com | www.badral.net
Bolorsoft LLC, Selbe Khotkhon 40/4 D2, District 11, Ulaanbaatar
Received on Wednesday, 29 July 2015 10:05:23 UTC