more on extended filtering: bug in step 2? from Jeremy J Carroll on 2014-05-07 (www-international@w3.org from April to June 2014)

From: Jeremy J Carroll <jjc@syapse.com>
Date: Wed, 7 May 2014 09:24:31 -0700
To: "www-international@w3.org" <www-international@w3.org>
Message-Id: <F51EB518-6CF0-4A4B-A1F8-96DF88EB2C89@syapse.com>

I have now implemented this - although I have still got to do testing etc.

http://sourceforge.net/p/bigdata/code/HEAD/tree/branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/search/ConfigurableAnalyzerFactory.java
the method is LanguageRange.extendedFilterMatch at line 178. Note the license is GPL.

While writing it I think I found an error in the description in RFC 4647 3.3.2 concerning private use tags starting with an "x-"

if the language range is "*-x-banana" I think it should match "x-banana" but it does not,
and if the language range is "*-DE" I think it should not match "x-banana-DE" but it does

Here is the text of the RFC:

To determine a match:

1. Split both the extended language range and the language tag being
compared into a list of subtags by dividing on the hyphen (%x2D)
character. Two subtags match if either they are the same when
compared case-insensitively or the language range's subtag is the
wildcard '*'.

2. Begin with the first subtag in each list. If the first subtag in
the range does not match the first subtag in the tag, the overall
match fails. Otherwise, move to the next subtag in both the
range and the tag.

3. While there are more subtags left in the language range's list:

A. If the subtag currently being examined in the range is the
wildcard ('*'), move to the next subtag in the range and
continue with the loop.

B. Else, if there are no more subtags in the language tag's
list, the match fails.

C. Else, if the current subtag in the range's list matches the
current subtag in the language tag's list, move to the next
subtag in both lists and continue with the loop.

D. Else, if the language tag's subtag is a "singleton" (a single
letter or digit, which includes the private-use subtag 'x')
the match fails.

E. Else, move to the next subtag in the language tag's list and
continue with the loop.
4. When the language range's list has no more subtags, the match
succeeds.

In some sense I am pointing to a problem in step 2, in my code I fix it at line 186, which corresponds to the following variant of step 2:

2. Begin with the first subtag in each list:

A. If the first subtag in the range does not match the first
subtag in the tag, the overall match fails.

B. Else, if the first subtag in the range is '*' and the
first subtag in the language is 'x' then move to the
next subtag in the range and continue at step 3.

C. Otherwise, move to the next subtag in both the
range and the language tag and continue at step 3.

However I have understood that no one is much interested in this functionality anyway!

Jeremy
Syapse, Inc.

Received on Wednesday, 7 May 2014 16:25:02 UTC