Re: NNBSP Impact from Andrew West on 2015-07-29 (public-i18n-mongolian@w3.org from July to September 2015)

From: Andrew West <andrewcwest@gmail.com>
Date: Wed, 29 Jul 2015 09:41:55 +0100
To: Greg Eck <greck@postone.net>
Cc: "public-i18n-mongolian@w3.org" <public-i18n-mongolian@w3.org>, Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <CALgEMhyEGyypPDBjZP_JWJsn4_EM5z0dkwNaFGSs9_eQ=CSkPg@mail.gmail.com>

Hi Greg,

On 29 July 2015 at 04:26, Greg Eck <greck@postone.net> wrote:
>
> 1.)     NNBSP gives the following problems in the current Mongolian script
> Utilities functionality
>
> -          Considered to be a space in the case of most programming
> languages and embedded routines and therefore gives undesired results in
> parsing processes

Could you explain (with concrete examples if possible) exactly what
undesirable results result from NNBSP being a space character?

> -          Breaks the word as seen in word counting, word jumping, sorting,
> parsing

I can understand the issues with word selection, word counting and
word navigation, which I have verified exist in some software, notably
Word (but not all software -- Notepad and BabelPad both behave as
desired), but I am not sure what specific issue "parsing" refers to,
and I would like to see an example of incorrect sorting behaviour that
I can test using the Unicode Collation Algorithm (UCA).

If you are going to make a proposal for a new character you will need
to give specific examples of incorrect behaviour, and explain why this
incorrect behaviour cannot be remedied by tweaking Unicode properties
or the UCA.  On the Unicode internal (Unicore) mailing list Asmus
Freytag suggested that the word break property of NNBSP could be
changed so that by default there would be no word break when the
character before and after it belonged to the same category (e.g. both
letters, as is the case for Mongolian).  Making this change should
solve the word boundary issue, as early as Unicode 9.0 next June if
someone makes a proposal to the UTC soon, but encoding a new character
will take at least two years, possibly much longer if there is
opposition from ISO national bodies.

It may take a while before Word catches up with changes to the word
break property, but it would take even longer for Word to support a
new character.  In my opinion, the main advantage of property change
over encoding a new character is that the property change will fix
existing Mongolian text, whereas the new character will have no effect
on existing Mongolian text, and users will still complain that word
selection etc. does not work for pre-new-character Mongolian text (and
users will not even start to use the new character until it is not
displayed as an empty box on their system, and it produces the
expected shaping behaviour, which will probably be several years after
the several years to get it encoded).

A further problem with encoding a new character is that when it is
eventually supported by fonts and rendering systems, Mongolian text
with NNBSP and Mongolian text with the new character will look the
same to end users, with the result that users will start to complain
that internet searches and local find/replace functions do not work
correctly for Mongolian because searching for a Mongolian word with
the new character will not match the same word with NNBSP and vice
versa.  And this problem will never go away, because no-one is going
to magically change existing Mongolian data, and input methods and
users will continue to use NNBSP in place of the new character for
years to come -- why not? they both look the same and produce the same
visual result.

All in all, I firmly believe that encoding a new character will create
more and worse problems than it solves.

Andrew

Received on Wednesday, 29 July 2015 08:42:24 UTC