RE: Java implementations of RFC 4647 extended filtering from Phillips, Addison on 2014-05-06 (www-international@w3.org from April to June 2014)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 6 May 2014 18:01:17 +0000
To: Jeremy J Carroll <jjc@syapse.com>, "Mark Davis ☕ (mark@macchiato.com)" <mark@macchiato.com>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB517E78621@ex10-mbx-36009.ant.amazon.com>

I would still tend to add something I alluded to in my offlist reply, which is the UTR#35 likely subtags process (although I tailor my implementation). This will help you with the potpourri of Chinese language tags (for example) without interfering with other languages in which the presence or absence of certain subtags conveys important meaning.

Addison

From: Jeremy J Carroll [mailto:jjc@syapse.com]
Sent: Tuesday, May 06, 2014 10:00 AM
To: Phillips, Addison; "Mark Davis ☕ (mark@macchiato.com)"
Cc: www-international@w3.org
Subject: Re: Java implementations of RFC 4647 extended filtering

(back on-list)

Hi Mark, Addison

Both of you have hinted that extended filtering is not the right answer …. so I suppose I should specify my question.

I have an RDF triple store, that provides some full text search capability via a Lucene like index, that uses Lucene Analyzers to tokenize (e.g. the Lucene CJKAnalyzer [2] or FrenchAnalyzer.

The interface [1] that I have to implement gives me the language tag part of the RDF literal and asks me to return an analyzer, and my initial design is to use extended filtering: i.e. I will return an analyzer that is associated with a language range that matches the language tag. In the case of a tie I will take the longest.

It has its limitations true, but it seems to me that it is likely to be  better than basic filtering.

Jeremy

[1]
http://www.bigdata.com/docs/api/com/bigdata/search/IAnalyzerFactory.html#getAnalyzer(java.lang.String,%20boolean)

[2]
http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/analysis/cjk/CJKAnalyzer.html

(I know that is a bit out of date …)

On May 6, 2014, at 6:49 AM, Mark Davis ☕️ <mark@macchiato.com<mailto:mark@macchiato.com>> wrote:

ICU provides locale matching, but doesn't implement the filtering in 4647, which has limitations.

offlist
On May 6, 2014, at 9:07 AM, "Phillips, Addison" <addison@lab126.com<mailto:addison@lab126.com>> wrote:

Extended filtering, to my knowledge, has no public implementations

…

You might have a case where implementing extended filtering might be useful,

…

Received on Tuesday, 6 May 2014 18:02:20 UTC