Re: Java implementations of RFC 4647 extended filtering

Jeremy J Carroll scripsit:

> if I have a fr-CN analyzer, and text tagged as fr-LATN-CN then the
> lookup algorithm fails and the filtering algorithm would not.

That's true, which is why we have the admittedly incomplete
Suppress-Script information in the LSTR; you can look for "fr-Latn"
and change it to "fr" before matching.

With filtering, though, if you have "fr" text and all three analyzers,
you will get "fr-FR" and "fr-CA" returned, with no guidance about which
to use.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
Mark Twain on Cecil Rhodes: I admire him, I freely admit it,
and when his time comes I shall buy a piece of the rope for a keepsake.

Received on Tuesday, 6 May 2014 17:50:38 UTC