Re: Searching from Andrew West on 2015-08-04 (public-i18n-mongolian@w3.org from July to September 2015)

From: Andrew West <andrewcwest@gmail.com>
Date: Tue, 4 Aug 2015 09:15:26 +0100
To: Richard Wordingham <richard.wordingham@ntlworld.com>
Cc: public-i18n-mongolian@w3.org
Message-ID: <CALgEMhy6yhoDVvtayVBv9shS15YidVzDrDcONRcAyszgX3sP+A@mail.gmail.com>

On 3 August 2015 at 23:31, Richard Wordingham
<richard.wordingham@ntlworld.com> wrote:
>
> Does the Mongolian alphabet as claimed by the Unicode
> codepoints work away from computers?  What I get from that list is
> the idea that something much closer to the Semitic original, largely
> based on shape, should have been encoded.

Well, yes, that would have been the expected encoding model based on
Unicode encoding principles, and it is clear now (and has been for
many years) that the current encoding model is deeply flawed and
problematic.  I hasten to add that this encoding model was not foisted
on an unwilling user community by the Unicode Consortium, but was
pushed for by experts from China and Mongolia (with the support of the
Chinese and Mongolian national bodies).  With hindsight the UTC and
other ISO national bodies should have rejected this encoding model,
but perhaps the implications were not fully understood at the time.

I agree with Jirimutu that the Mongolian encoding model is the worst
encoding model in Unicode, but I also agree that we are stuck with it,
and that it is not possible to radically revise it at this stage.  I
think that the best we can do is mitigate the problems of multiple
representation by defining fuzzy matching rules for Mongolian along
the lines of Jirimutu's folding list, for use by search engines and
text processing applications.  This could be informally written up as
a Unicode Technical Note, or formally defined somewhere in the Unicode
character database.

Andrew

Received on Tuesday, 4 August 2015 08:15:57 UTC