more on extended filtering: bug in step 2?

I have now implemented this - although I have still got to do testing etc.

http://sourceforge.net/p/bigdata/code/HEAD/tree/branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/search/ConfigurableAnalyzerFactory.java
the method is LanguageRange.extendedFilterMatch at line 178. Note the license is GPL.

While writing it I think I found an error in the description in RFC 4647 3.3.2 concerning private use tags starting with an "x-"

if the language range is "*-x-banana" I think it should match "x-banana" but it does not,
and if the language range is "*-DE" I think it should not match "x-banana-DE" but it does

Here is the text of the RFC:

To determine a match:

   1.  Split both the extended language range and the language tag being
       compared into a list of subtags by dividing on the hyphen (%x2D)
       character.  Two subtags match if either they are the same when
       compared case-insensitively or the language range's subtag is the
       wildcard '*'.

   2.  Begin with the first subtag in each list.  If the first subtag in
       the range does not match the first subtag in the tag, the overall
       match fails.  Otherwise, move to the next subtag in both the
       range and the tag.

   3.  While there are more subtags left in the language range's list:

       A.  If the subtag currently being examined in the range is the
           wildcard ('*'), move to the next subtag in the range and
           continue with the loop.

       B.  Else, if there are no more subtags in the language tag's
           list, the match fails.

       C.  Else, if the current subtag in the range's list matches the
           current subtag in the language tag's list, move to the next
           subtag in both lists and continue with the loop.

       D.  Else, if the language tag's subtag is a "singleton" (a single
           letter or digit, which includes the private-use subtag 'x')
           the match fails.

       E.  Else, move to the next subtag in the language tag's list and
           continue with the loop.
   4.  When the language range's list has no more subtags, the match
       succeeds.

In some sense I am pointing to a problem in step 2, in my code I fix it at line 186, which corresponds to the following variant of step 2:


   2.  Begin with the first subtag in each list:

       A.  If the first subtag in the range does not match the first 
           subtag in the tag, the overall match fails.

       B.  Else, if the first subtag in the range is '*' and the
           first subtag in the language is 'x' then move to the
           next subtag in the range and continue at step 3.

       C.  Otherwise, move to the next subtag in both the
           range and the language tag and continue at step 3.

However I have understood that no one is much interested in this functionality anyway! 

Jeremy
Syapse, Inc.

Received on Wednesday, 7 May 2014 16:25:02 UTC