W3C home > Mailing lists > Public > public-i18n-mongolian@w3.org > July to September 2015

Re: Searching

From: Badral S. <badral@bolorsoft.com>
Date: Fri, 07 Aug 2015 00:54:10 +0200
Message-ID: <55C3E592.8020402@bolorsoft.com>
To: public-i18n-mongolian@w3.org
Hi Richard,
I don't understand what is the folding. Is it recommendation for Unicode 
standard? or relevant to W3?
What exactly is it?

Badral

On 03.08.2015 20:06, Richard Wordingham wrote:
> On Mon, 3 Aug 2015 08:16:26 +0900
> <jrmt@almas.co.jp> wrote:
>
>> Dear Mr. Richard
>>
>>> Is that true?  There may be more than two spellings that look the
>>> same, but do they *sound* the same? As I understand it, the
>>> Mongolian encoding represents sounds as well as appearance.  Are
>>> Mongolian dictionaries sorted according to sound or according to
>>> visual form?
>> Yes you are right. They are sound different, the dictionary list the
>> words in their *sound*. But most of the Mongolian people can not
>> exactly distinguish which word is which. Even the linguistic expert
>> make mistake without dictionary. But some times dictionary, listed
>> them in different position, according to the authors point of view.
>> For this reason, the text existing in public, remains so many wrong
>> spelled words. When people read them, it is no problem, but when we
>> search in the Google, we have to search each possible spelling. For
>> example, we will search the word Mongolian ᠮᠣᠩᠭᠤᠯ  at least four
>> times.
> What is needed is some sort of folding, in the same way as Google
> ignores the difference between upper and lower cases and often ignores
> diacritics.  As a first approximation one should ignore the differences
> A v. E, O v. U, and OE v. UE.  Possibly O and OE should also be folded;
> that is where it becomes complicated.  Several consonant pairs should
> also be folded, though a proper design may be complicated.
>
> Richard.
>


-- 
Badral Sanlig, Software architect
www.bolorsoft.com | www.badral.net
Bolorsoft LLC, Selbe Khotkhon 40/4 D2, District 11, Ulaanbaatar
Received on Thursday, 6 August 2015 22:54:41 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:07:05 UTC