Re: Question on text search

Hi Jerry,

On Jun 2, 2010, at 16:57 , Jerry Carter wrote:
> Although not a member of the I18N group, I can share my experiences from working with the W3C grammar, speech synthesis, and lexicon specifications (SRGS, SSML, PLS).  These suggest that trying to mandate specific behavior across all languages is inappropriate.  That stated, I would not want to leave matching entirely up to implementations as it makes the behavior untestable.

Agreed, in principle at least.

> I recommend instead requiring specific behavior in specific languages (i.e. MUST level) and then leaving other choices to the implementations as in "other implementations MAY apply other matching logic as appropriate for meeting the expectations of specific languages and countries."

The problem here is that in 99% of cases (and I'm being conservative) we simply won't know what language is being processed. We're dealing with address book data, not a properly language-contextualised corpus. We have to handle existing contacts databases that won't have that information, and I don't think that we can hope to mandate that UIs expose a language identifier on fields for which it makes sense (and if they did, users would still enter it wrong most of the time).

I'm trying to think of heuristics but can't seem to find any. You could try using the country to guess the language of the address fields but some countries have several languages and even though it's in the UK I still might have entered "Londres". Guessing what language to apply to names based on that will just be random.

So while in theory I agree that search needs to be language specific, we don't have that information. We don't even have enough data to guess. This leaves us with a language-agnostic approach unless there's a smart trick I haven't thought of.

--
Robin Berjon
  robineko — hired gun, higher standards
  http://robineko.com/

Received on Wednesday, 2 June 2010 15:27:01 UTC