Re: Java implementations of RFC 4647 extended filtering

(back on-list)

Hi Mark, Addison

Both of you have hinted that extended filtering is not the right answer …. so I suppose I should specify my question.

I have an RDF triple store, that provides some full text search capability via a Lucene like index, that uses Lucene Analyzers to tokenize (e.g. the Lucene CJKAnalyzer [2] or FrenchAnalyzer.

The interface [1] that I have to implement gives me the language tag part of the RDF literal and asks me to return an analyzer, and my initial design is to use extended filtering: i.e. I will return an analyzer that is associated with a language range that matches the language tag. In the case of a tie I will take the longest.

It has its limitations true, but it seems to me that it is likely to be  better than basic filtering.

Jeremy

[1]
http://www.bigdata.com/docs/api/com/bigdata/search/IAnalyzerFactory.html#getAnalyzer(java.lang.String,%20boolean)

[2]
http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/analysis/cjk/CJKAnalyzer.html
(I know that is a bit out of date …)


On May 6, 2014, at 6:49 AM, Mark Davis ☕️ <mark@macchiato.com> wrote:

> ICU provides locale matching, but doesn't implement the filtering in 4647, which has limitations.


offlist
On May 6, 2014, at 9:07 AM, "Phillips, Addison" <addison@lab126.com> wrote:
> 
> Extended filtering, to my knowledge, has no public implementations 

…
> 
> You might have a case where implementing extended filtering might be useful,

…

Received on Tuesday, 6 May 2014 17:00:21 UTC