Re: [string-search] Requirements for Indian languages (#10)

@aphillips 

Please find the below feedback received by Bengali expert.

> I was looking over your document today in preparation for taking up some of the text into string-search. I intend to add questions to this thread as I go through your document. For starters, I found this paragraph:
> 
> > Bengali, is one of the notorious languages with regard to spelling variation. The different spellings of word having same meaning are accepted in the Bengali language, it should be treated as different word although have same meaning. It has 5000+ words which record spelling variations.Typically, spelling variation ranges from 2 to 8 words.Majority of words have 2 variations; some have 3, 4, 8 and more variations. At least there is one word that records 16 spelling variations.Nearly 80% words show two spelling, 7% words show three variations, 7% words show four spellings, and 6% words show more than four variations.

> Can you clarify that "different spellings of word... should be treated as different word" means that spelling variations should be treated as if they were different words (_not_ matching)? I know that's what the sentence means, but want to be sure that this was your intention.

Yes! That is the argument. Because, in a text, you never know which spelling will be used by the text creator, and if your inbuilt system does not have all possible variants, then predicting the right spelling matches will be quite problematic.

> Do Bengali users expect document searches not to provide spelling variation matches at all? Or are there features in some programs for users to find such matches?

 If a document search system can capture all possible variations of all the words that show spelling variations, there is no problem. The reality is that to date we have not come across any such system that can predict all possible variations of spelling. I have not even come across any database that records all possible spelling variations of Bengali words. 



-- 
GitHub Notification of comment by vermaprashant1
Please view or discuss this issue at https://github.com/w3c/string-search/issues/10#issuecomment-1183088329 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 13 July 2022 11:10:46 UTC