- From: Jeremy J Carroll <jjc@syapse.com>
- Date: Wed, 7 May 2014 12:37:19 -0700
- To: "Phillips, Addison" <addison@lab126.com>
- Cc: "www-international@w3.org" <www-international@w3.org>
- Message-Id: <21C23FA7-E813-4A3F-B0A8-368900EEEEA7@syapse.com>
In the x-banana-DE case, the x is move over, matching '*' in step 2, and 3D doesn't apply, hence my 2B which then expose 'x' to 3D Certainly whether “*-x-banana” should match “x-banana” is hardly a big deal either way Jeremy J Carroll Principal Architect Syapse, Inc. On May 7, 2014, at 9:55 AM, "Phillips, Addison" <addison@lab126.com> wrote: > I don’t read the RFC that way. > > I don’t think “*-x-banana” should match “x-banana”. The leading “-“ in the range is not optional. Another way to say this is that the “*” on the front is non-optional (rule 1, rule 2). > > “*-DE” should not match “x-banana-DE” because of rule 3D. The “DE” in the range would match a concrete (non-private) subtag. The “DE” bearing range that matches “x-banana-DE” is “*-x-DE”. Rule 3D says that when you see a singleton including ‘x’ in the tag (that doesn’t have a match in the range), the match fails. This prevents false positives in which you want to select region “DE” and find tags with some non-region private/extension gorp containing “DE”. > > Addison > > > From: Jeremy J Carroll [mailto:jjc@syapse.com] > Sent: Wednesday, May 07, 2014 9:25 AM > To: www-international@w3.org > Subject: more on extended filtering: bug in step 2? > > I have now implemented this - although I have still got to do testing etc. > > http://sourceforge.net/p/bigdata/code/HEAD/tree/branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/search/ConfigurableAnalyzerFactory.java > the method is LanguageRange.extendedFilterMatch at line 178. Note the license is GPL. > > While writing it I think I found an error in the description in RFC 4647 3.3.2 concerning private use tags starting with an "x-" > > if the language range is "*-x-banana" I think it should match "x-banana" but it does not, > and if the language range is "*-DE" I think it should not match "x-banana-DE" but it does > > Here is the text of the RFC: > > To determine a match: > > 1. Split both the extended language range and the language tag being > compared into a list of subtags by dividing on the hyphen (%x2D) > character. Two subtags match if either they are the same when > compared case-insensitively or the language range's subtag is the > wildcard '*'. > > 2. Begin with the first subtag in each list. If the first subtag in > the range does not match the first subtag in the tag, the overall > match fails. Otherwise, move to the next subtag in both the > range and the tag. > > 3. While there are more subtags left in the language range's list: > > A. If the subtag currently being examined in the range is the > wildcard ('*'), move to the next subtag in the range and > continue with the loop. > > B. Else, if there are no more subtags in the language tag's > list, the match fails. > > C. Else, if the current subtag in the range's list matches the > current subtag in the language tag's list, move to the next > subtag in both lists and continue with the loop. > > D. Else, if the language tag's subtag is a "singleton" (a single > letter or digit, which includes the private-use subtag 'x') > the match fails. > > E. Else, move to the next subtag in the language tag's list and > continue with the loop. > 4. When the language range's list has no more subtags, the match > succeeds. > > In some sense I am pointing to a problem in step 2, in my code I fix it at line 186, which corresponds to the following variant of step 2: > > > 2. Begin with the first subtag in each list: > A. If the first subtag in the range does not match the first > subtag in the tag, the overall match fails. > B. Else, if the first subtag in the range is '*' and the > first subtag in the language is 'x' then move to the > next subtag in the range and continue at step 3. > > C. Otherwise, move to the next subtag in both the > range and the language tag and continue at step 3. > > However I have understood that no one is much interested in this functionality anyway! > > Jeremy > Syapse, Inc. > >
Received on Wednesday, 7 May 2014 19:37:49 UTC