RE: Searching (was: FVS Assignment MisMatch) from jrmt@almas.co.jp on 2015-08-03 (public-i18n-mongolian@w3.org from July to September 2015)

From: <jrmt@almas.co.jp>
Date: Tue, 4 Aug 2015 05:04:13 +0900
To: "'Richard Wordingham'" <richard.wordingham@ntlworld.com>, <public-i18n-mongolian@w3.org>
Cc: <public-i18n-mongolian@w3.org>
Message-ID: <000901d0ce27$91eab970$b5c02c50$@almas.co.jp>

Dear Mr. Richard,

> What is needed is some sort of folding, in the same way as Google ignores the difference between 
> upper and lower cases and often ignores diacritics.  As a first approximation one should ignore the 
> differences A v. E, O v. U, and OE v. UE.  Possibly O and OE should also be folded; 
> that is where it becomes complicated.  
> Several consonant pairs should also be folded, though a proper design may be complicated.
Yes. The folding is the solution for searching. We are utilizing this method in our local search and 
in our dictionary based Input method to help and correct the user's input. 
But the public search engine still not come to this level yet, this is our future task to ask public 
search engine to handle the search problem.

The following is our folding list.

1. A v. E,
2. O v U
3. UE v. UE (first syllable)
4. O v. U v. OE v. UE (medial and final from)
5. I v. Y     - ( medial form is the UCS rule's new addition )
6. J v. Y (Initial)
7. O U OE UE v. W (medial, final) - the UCS's new addition
8. HE v. GE, HI v. GI, HOE v. GOE, HUE v. GUE
9. TA v. DA
10 EE v. WA  - (the UCS's new addition)
11 KA v. KHA - (the UCS's new addition)
12. HAA v. ZHI (medial and final)


-----Original Message-----
From: Richard Wordingham [mailto:richard.wordingham@ntlworld.com] 
Sent: Tuesday, August 4, 2015 3:07 AM
To: public-i18n-mongolian@w3.org
Cc: public-i18n-mongolian@w3.org
Subject: Searching (was: FVS Assignment MisMatch)

On Mon, 3 Aug 2015 08:16:26 +0900
<jrmt@almas.co.jp> wrote:

> Dear Mr. Richard
> 
> > Is that true?  There may be more than two spellings that look the 
> > same, but do they *sound* the same? As I understand it, the 
> > Mongolian encoding represents sounds as well as appearance.  Are 
> > Mongolian dictionaries sorted according to sound or according to 
> > visual form?
> Yes you are right. They are sound different, the dictionary list the 
> words in their *sound*. But most of the Mongolian people can not 
> exactly distinguish which word is which. Even the linguistic expert 
> make mistake without dictionary. But some times dictionary, listed 
> them in different position, according to the authors point of view.
> For this reason, the text existing in public, remains so many wrong 
> spelled words. When people read them, it is no problem, but when we 
> search in the Google, we have to search each possible spelling. For 
> example, we will search the word Mongolian ᠮᠣᠩᠭᠤᠯ  at least four 
> times.

What is needed is some sort of folding, in the same way as Google ignores the difference between upper and lower cases and often ignores diacritics.  As a first approximation one should ignore the differences A v. E, O v. U, and OE v. UE.  Possibly O and OE should also be folded; that is where it becomes complicated.  Several consonant pairs should also be folded, though a proper design may be complicated.

Richard.

Received on Monday, 3 August 2015 20:04:38 UTC