Re: Question on text search

On Jun 2, 2010, at 17:41 , Jerry Carter wrote:
> For this domain, the key language is that of the owner of the device.  The behavior of text search should match the expectations and customs of the owner.  We should not be surprised to find that the same string of characters matches differently depending on the language/country and preferences of the user.

I thought of this as a heuristic as well, but it's not at all clear to me how it would work. Its first issue is that it assumes that a user has a single language, which I find to be a very common mistake in I18N architecture. I don't have numbers, but I would be surprised if a large plurality of the world's users weren't multilingual. The fact that my phone's OS is in English won't help you much match the vast amount of French data that I have, not to mention all those contact entries I've received from people all around the world. It provides some minimal amount of help in understanding my entry (and even then, not really) but doesn't help with the data. I don't see how you can use this to get a match.

> Take nicknames for instance.  An US English speaker may find 'Manny' to be an appropriate match for 'Manuel' whereas a Mexican Spanish speaker may find this inappropriate.

That's certainly true but I don't think that we're looking at nickname stemming :) And it would still have the issue of how to interpret a query from a user who speaks English, Spanish, and Spanglish — something that's not uncommon in the US. Or even simply, even if I were monolingual, and if the system supported nickname stemming, I would want "Manny" to match an American friend called Manuel, but "Manu" to match the French "Emmanuel". The information for that would have to be attached to their names, not come from me (and using their country of residence doesn't help).

--
Robin Berjon
  robineko — hired gun, higher standards
  http://robineko.com/

Received on Wednesday, 2 June 2010 16:02:12 UTC