W3C home > Mailing lists > Public > public-device-apis@w3.org > June 2010

Re: Question on text search

From: Felix Sasaki <felix.sasaki@fh-potsdam.de>
Date: Thu, 3 Jun 2010 08:21:09 +0200
Message-ID: <AANLkTikXTSxG5TEkss5NyD3wUv2L63vLTwryQMEC4KKc@mail.gmail.com>
To: Mark Davis ☕ <mark@macchiato.com>
Cc: Robin Berjon <robin@robineko.com>, "Phillips, Addison" <addison@lab126.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "public-device-apis@w3.org" <public-device-apis@w3.org>
2010/6/2 Mark Davis ☕ <mark@macchiato.com>

> As a default, I'd suggest the Unicode algorithm documented in
> http://unicode.org/reports/tr10/#Searching.
>
> Commercial systems (such as what we do a Google) will go beyond this to
> include more sophisticated processing such as synonyms, language-sensitive
> deaccenting, stemming, etc., but that is beyond what can be required in a
> general specification.
>


"XQuery full text search" goes the path of no requiring specific behavior,
but providing "slots", so called matching options, where implementers can
integrate language specific behavior. See the matching options
http://www.w3.org/TR/2010/CR-xpath-full-text-10-20100128/#ftmatchoptions for
details. That includes things like using thesauris for resolving synonyms,
stemming, etc.

Felix



>
> Mark
>
> — Il meglio è l’inimico del bene —
>
>
>
> On Wed, Jun 2, 2010 at 08:14, Robin Berjon <robin@robineko.com> wrote:
>
>> Hi Addison,
>>
>> On Jun 2, 2010, at 16:52 , Phillips, Addison wrote:
>> > Hi, I've added this to our agenda to discuss.
>>
>> Excellent, thanks a lot!
>>
>> > Full text search is a somewhat complex topic and varies by language.
>>
>> I like your use of the word "somewhat".
>>
>> > Looking at the Contacts API draft quickly I notice many interesting
>> internationalization issues that may not be fully addressed (handling of
>> personal names; handling of postal addresses; enumerated types which need to
>> consider the needs of other cultures; etc.)
>>
>> Yes, we're aware of these issues. The schema that describes people is
>> caught between two contradictory tensions: do the right thing, and do
>> something that can be layered atop existing implementations (some of which
>> are dreadfully daft). Personally I'd rather we did things right even if it
>> limits the target platforms somehow, but that's not necessarily a consensual
>> view.
>>
>> That being said, we've already received feedback that's likely to cause us
>> to rethink the current schema (yet again...). As a result I think that in
>> the interest of not taking up your precious resources it might be best to
>> wait. We will definitely ask for your review once we've stabilised this.
>>
>> Thanks a lot!
>>
>> --
>> Robin Berjon
>>  robineko — hired gun, higher standards
>>  http://robineko.com/
>>
>>
>>
>>
>>
>>
>
Received on Thursday, 3 June 2010 06:21:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:14:10 GMT