RE: NNBSP Impact

Hi Andrew,

Thanks for your good thoughts on the NNBSP situation.

I don't have any personal examples of problems found in the use of utilities with the NNBSP as a space character. I was just representing Badral and Jirimutu there.

If Badral feels that we can go with tweaking the NNBSP so that the word does not break around the NNBSP, then I am fine to proceed with modifying the current NNBSP. Of course, we do want to hear from Jirimutu on the issue of problems involving the space characteristics of the NNBSP. I would be very happy if we could get by without creating a new character. I had just thought the problems out there were too big for it. With the new information coming in, maybe the issues are not quite so insurmountable.


-----Original Message-----
From: Andrew West []
Sent: Wednesday, July 29, 2015 4:42 PM
To: Greg Eck <>
Cc:; Asmus Freytag <>
Subject: Re: NNBSP Impact

Hi Greg,

On 29 July 2015 at 04:26, Greg Eck <<>> wrote:
> 1.)     NNBSP gives the following problems in the current Mongolian script
> Utilities functionality
> -          Considered to be a space in the case of most programming
> languages and embedded routines and therefore gives undesired results
> in parsing processes

Could you explain (with concrete examples if possible) exactly what undesirable results result from NNBSP being a space character?

> -          Breaks the word as seen in word counting, word jumping, sorting,
> parsing

I can understand the issues with word selection, word counting and word navigation, which I have verified exist in some software, notably Word (but not all software -- Notepad and BabelPad both behave as desired), but I am not sure what specific issue "parsing" refers to, and I would like to see an example of incorrect sorting behaviour that I can test using the Unicode Collation Algorithm (UCA).

If you are going to make a proposal for a new character you will need to give specific examples of incorrect behaviour, and explain why this incorrect behaviour cannot be remedied by tweaking Unicode properties or the UCA.  On the Unicode internal (Unicore) mailing list Asmus Freytag suggested that the word break property of NNBSP could be changed so that by default there would be no word break when the character before and after it belonged to the same category (e.g. both letters, as is the case for Mongolian).  Making this change should solve the word boundary issue, as early as Unicode 9.0 next June if someone makes a proposal to the UTC soon, but encoding a new character will take at least two years, possibly much longer if there is opposition from ISO national bodies.

It may take a while before Word catches up with changes to the word break property, but it would take even longer for Word to support a new character.  In my opinion, the main advantage of property change over encoding a new character is that the property change will fix existing Mongolian text, whereas the new character will have no effect on existing Mongolian text, and users will still complain that word selection etc. does not work for pre-new-character Mongolian text (and users will not even start to use the new character until it is not displayed as an empty box on their system, and it produces the expected shaping behaviour, which will probably be several years after the several years to get it encoded).

A further problem with encoding a new character is that when it is eventually supported by fonts and rendering systems, Mongolian text with NNBSP and Mongolian text with the new character will look the same to end users, with the result that users will start to complain that internet searches and local find/replace functions do not work correctly for Mongolian because searching for a Mongolian word with the new character will not match the same word with NNBSP and vice versa.  And this problem will never go away, because no-one is going to magically change existing Mongolian data, and input methods and users will continue to use NNBSP in place of the new character for years to come -- why not? they both look the same and produce the same visual result.

All in all, I firmly believe that encoding a new character will create more and worse problems than it solves.


Received on Thursday, 30 July 2015 07:55:36 UTC