W3C home > Mailing lists > Public > www-international@w3.org > April to June 2014

Re: Java implementations of RFC 4647 extended filtering

From: Jeremy J Carroll <jjc@syapse.com>
Date: Tue, 6 May 2014 09:59:46 -0700
Cc: "www-international@w3.org" <www-international@w3.org>
Message-Id: <FA264744-BE5A-4579-82EC-32F434C6EF63@syapse.com>
To: "Phillips, Addison" <addison@lab126.com>, "Mark Davis ☕ (mark@macchiato.com)" <mark@macchiato.com>
(back on-list)

Hi Mark, Addison

Both of you have hinted that extended filtering is not the right answer …. so I suppose I should specify my question.

I have an RDF triple store, that provides some full text search capability via a Lucene like index, that uses Lucene Analyzers to tokenize (e.g. the Lucene CJKAnalyzer [2] or FrenchAnalyzer.

The interface [1] that I have to implement gives me the language tag part of the RDF literal and asks me to return an analyzer, and my initial design is to use extended filtering: i.e. I will return an analyzer that is associated with a language range that matches the language tag. In the case of a tie I will take the longest.

It has its limitations true, but it seems to me that it is likely to be  better than basic filtering.

Jeremy

[1]
http://www.bigdata.com/docs/api/com/bigdata/search/IAnalyzerFactory.html#getAnalyzer(java.lang.String,%20boolean)

[2]
http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/analysis/cjk/CJKAnalyzer.html
(I know that is a bit out of date …)


On May 6, 2014, at 6:49 AM, Mark Davis ☕️ <mark@macchiato.com> wrote:

> ICU provides locale matching, but doesn't implement the filtering in 4647, which has limitations.


offlist
On May 6, 2014, at 9:07 AM, "Phillips, Addison" <addison@lab126.com> wrote:
> 
> Extended filtering, to my knowledge, has no public implementations 

…
> 
> You might have a case where implementing extended filtering might be useful,

…


Received on Tuesday, 6 May 2014 17:00:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:05 UTC