- From: Jeremy J Carroll <jjc@syapse.com>
- Date: Wed, 7 May 2014 09:24:31 -0700
- To: "www-international@w3.org" <www-international@w3.org>
- Message-Id: <F51EB518-6CF0-4A4B-A1F8-96DF88EB2C89@syapse.com>
I have now implemented this - although I have still got to do testing etc. http://sourceforge.net/p/bigdata/code/HEAD/tree/branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/search/ConfigurableAnalyzerFactory.java the method is LanguageRange.extendedFilterMatch at line 178. Note the license is GPL. While writing it I think I found an error in the description in RFC 4647 3.3.2 concerning private use tags starting with an "x-" if the language range is "*-x-banana" I think it should match "x-banana" but it does not, and if the language range is "*-DE" I think it should not match "x-banana-DE" but it does Here is the text of the RFC: To determine a match: 1. Split both the extended language range and the language tag being compared into a list of subtags by dividing on the hyphen (%x2D) character. Two subtags match if either they are the same when compared case-insensitively or the language range's subtag is the wildcard '*'. 2. Begin with the first subtag in each list. If the first subtag in the range does not match the first subtag in the tag, the overall match fails. Otherwise, move to the next subtag in both the range and the tag. 3. While there are more subtags left in the language range's list: A. If the subtag currently being examined in the range is the wildcard ('*'), move to the next subtag in the range and continue with the loop. B. Else, if there are no more subtags in the language tag's list, the match fails. C. Else, if the current subtag in the range's list matches the current subtag in the language tag's list, move to the next subtag in both lists and continue with the loop. D. Else, if the language tag's subtag is a "singleton" (a single letter or digit, which includes the private-use subtag 'x') the match fails. E. Else, move to the next subtag in the language tag's list and continue with the loop. 4. When the language range's list has no more subtags, the match succeeds. In some sense I am pointing to a problem in step 2, in my code I fix it at line 186, which corresponds to the following variant of step 2: 2. Begin with the first subtag in each list: A. If the first subtag in the range does not match the first subtag in the tag, the overall match fails. B. Else, if the first subtag in the range is '*' and the first subtag in the language is 'x' then move to the next subtag in the range and continue at step 3. C. Otherwise, move to the next subtag in both the range and the language tag and continue at step 3. However I have understood that no one is much interested in this functionality anyway! Jeremy Syapse, Inc.
Received on Wednesday, 7 May 2014 16:25:02 UTC