- From: Phillips, Addison <addison@lab126.com>
- Date: Wed, 7 May 2014 16:55:13 +0000
- To: Jeremy J Carroll <jjc@syapse.com>, "www-international@w3.org" <www-international@w3.org>
- Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB517E7B855@ex10-mbx-36009.ant.amazon.com>
I don’t read the RFC that way.
I don’t think “*-x-banana” should match “x-banana”. The leading “-“ in the range is not optional. Another way to say this is that the “*” on the front is non-optional (rule 1, rule 2).
“*-DE” should not match “x-banana-DE” because of rule 3D. The “DE” in the range would match a concrete (non-private) subtag. The “DE” bearing range that matches “x-banana-DE” is “*-x-DE”. Rule 3D says that when you see a singleton including ‘x’ in the tag (that doesn’t have a match in the range), the match fails. This prevents false positives in which you want to select region “DE” and find tags with some non-region private/extension gorp containing “DE”.
Addison
From: Jeremy J Carroll [mailto:jjc@syapse.com]
Sent: Wednesday, May 07, 2014 9:25 AM
To: www-international@w3.org
Subject: more on extended filtering: bug in step 2?
I have now implemented this - although I have still got to do testing etc.
http://sourceforge.net/p/bigdata/code/HEAD/tree/branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/search/ConfigurableAnalyzerFactory.java
the method is LanguageRange.extendedFilterMatch at line 178. Note the license is GPL.
While writing it I think I found an error in the description in RFC 4647 3.3.2 concerning private use tags starting with an "x-"
if the language range is "*-x-banana" I think it should match "x-banana" but it does not,
and if the language range is "*-DE" I think it should not match "x-banana-DE" but it does
Here is the text of the RFC:
To determine a match:
1. Split both the extended language range and the language tag being
compared into a list of subtags by dividing on the hyphen (%x2D)
character. Two subtags match if either they are the same when
compared case-insensitively or the language range's subtag is the
wildcard '*'.
2. Begin with the first subtag in each list. If the first subtag in
the range does not match the first subtag in the tag, the overall
match fails. Otherwise, move to the next subtag in both the
range and the tag.
3. While there are more subtags left in the language range's list:
A. If the subtag currently being examined in the range is the
wildcard ('*'), move to the next subtag in the range and
continue with the loop.
B. Else, if there are no more subtags in the language tag's
list, the match fails.
C. Else, if the current subtag in the range's list matches the
current subtag in the language tag's list, move to the next
subtag in both lists and continue with the loop.
D. Else, if the language tag's subtag is a "singleton" (a single
letter or digit, which includes the private-use subtag 'x')
the match fails.
E. Else, move to the next subtag in the language tag's list and
continue with the loop.
4. When the language range's list has no more subtags, the match
succeeds.
In some sense I am pointing to a problem in step 2, in my code I fix it at line 186, which corresponds to the following variant of step 2:
2. Begin with the first subtag in each list:
A. If the first subtag in the range does not match the first
subtag in the tag, the overall match fails.
B. Else, if the first subtag in the range is '*' and the
first subtag in the language is 'x' then move to the
next subtag in the range and continue at step 3.
C. Otherwise, move to the next subtag in both the
range and the language tag and continue at step 3.
However I have understood that no one is much interested in this functionality anyway!
Jeremy
Syapse, Inc.
Received on Wednesday, 7 May 2014 16:56:32 UTC