Re: Question on text search from Mark Davis ☕ on 2010-06-02 (public-i18n-core@w3.org from April to June 2010)

From: Mark Davis ☕ <mark@macchiato.com>
Date: Wed, 2 Jun 2010 09:35:58 -0700
To: Robin Berjon <robin@robineko.com>
Cc: "Phillips, Addison" <addison@lab126.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "public-device-apis@w3.org" <public-device-apis@w3.org>
Message-ID: <AANLkTik4VFsO51RPp4EVeS6DTRTV8fb5EOIr37CbYWbI@mail.gmail.com>

As a default, I'd suggest the Unicode algorithm documented in
http://unicode.org/reports/tr10/#Searching.

Commercial systems (such as what we do a Google) will go beyond this to
include more sophisticated processing such as synonyms, language-sensitive
deaccenting, stemming, etc., but that is beyond what can be required in a
general specification.

Mark

— Il meglio è l’inimico del bene —


On Wed, Jun 2, 2010 at 08:14, Robin Berjon <robin@robineko.com> wrote:

> Hi Addison,
>
> On Jun 2, 2010, at 16:52 , Phillips, Addison wrote:
> > Hi, I've added this to our agenda to discuss.
>
> Excellent, thanks a lot!
>
> > Full text search is a somewhat complex topic and varies by language.
>
> I like your use of the word "somewhat".
>
> > Looking at the Contacts API draft quickly I notice many interesting
> internationalization issues that may not be fully addressed (handling of
> personal names; handling of postal addresses; enumerated types which need to
> consider the needs of other cultures; etc.)
>
> Yes, we're aware of these issues. The schema that describes people is
> caught between two contradictory tensions: do the right thing, and do
> something that can be layered atop existing implementations (some of which
> are dreadfully daft). Personally I'd rather we did things right even if it
> limits the target platforms somehow, but that's not necessarily a consensual
> view.
>
> That being said, we've already received feedback that's likely to cause us
> to rethink the current schema (yet again...). As a result I think that in
> the interest of not taking up your precious resources it might be best to
> wait. We will definitely ask for your review once we've stabilised this.
>
> Thanks a lot!
>
> --
> Robin Berjon
>  robineko — hired gun, higher standards
>  http://robineko.com/
>
>
>
>
>
>

Received on Wednesday, 2 June 2010 16:36:35 UTC