(back on-list)
Hi Mark, Addison
Both of you have hinted that extended filtering is not the right answer …. so I suppose I should specify my question.
I have an RDF triple store, that provides some full text search capability via a Lucene like index, that uses Lucene Analyzers to tokenize (e.g. the Lucene CJKAnalyzer [2] or FrenchAnalyzer.
The interface [1] that I have to implement gives me the language tag part of the RDF literal and asks me to return an analyzer, and my initial design is to use extended filtering: i.e. I will return an analyzer that is associated with a language range that matches the language tag. In the case of a tie I will take the longest.
It has its limitations true, but it seems to me that it is likely to be better than basic filtering.
Jeremy
[1]
http://www.bigdata.com/docs/api/com/bigdata/search/IAnalyzerFactory.html#getAnalyzer(java.lang.String,%20boolean)
[2]
http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/analysis/cjk/CJKAnalyzer.html
(I know that is a bit out of date …)
On May 6, 2014, at 6:49 AM, Mark Davis ☕️ <mark@macchiato.com> wrote:
> ICU provides locale matching, but doesn't implement the filtering in 4647, which has limitations.
offlist
On May 6, 2014, at 9:07 AM, "Phillips, Addison" <addison@lab126.com> wrote:
>
> Extended filtering, to my knowledge, has no public implementations
…
>
> You might have a case where implementing extended filtering might be useful,
…