Re: more on extended filtering: bug in step 2?

In the x-banana-DE case, the x is move over, matching '*' in step 2, and 3D doesn't apply, hence my 2B which then expose 'x' to 3D

Certainly whether “*-x-banana” should match “x-banana” is hardly a big deal either way



Jeremy J Carroll
Principal Architect
Syapse, Inc.



On May 7, 2014, at 9:55 AM, "Phillips, Addison" <addison@lab126.com> wrote:

> I don’t read the RFC that way.
>  
> I don’t think “*-x-banana” should match “x-banana”. The leading “-“ in the range is not optional. Another way to say this is that the “*” on the front is non-optional (rule 1, rule 2).
>  
> “*-DE” should not match “x-banana-DE” because of rule 3D. The “DE” in the range would match a concrete (non-private) subtag. The “DE” bearing range that matches “x-banana-DE” is “*-x-DE”. Rule 3D says that when you see a singleton including ‘x’ in the tag (that doesn’t have a match in the range), the match fails. This prevents false positives in which you want to select region “DE” and find tags with some non-region private/extension gorp containing “DE”.
>  
> Addison
>  
>  
> From: Jeremy J Carroll [mailto:jjc@syapse.com] 
> Sent: Wednesday, May 07, 2014 9:25 AM
> To: www-international@w3.org
> Subject: more on extended filtering: bug in step 2?
>  
> I have now implemented this - although I have still got to do testing etc.
>  
> http://sourceforge.net/p/bigdata/code/HEAD/tree/branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/search/ConfigurableAnalyzerFactory.java
> the method is LanguageRange.extendedFilterMatch at line 178. Note the license is GPL.
>  
> While writing it I think I found an error in the description in RFC 4647 3.3.2 concerning private use tags starting with an "x-"
>  
> if the language range is "*-x-banana" I think it should match "x-banana" but it does not,
> and if the language range is "*-DE" I think it should not match "x-banana-DE" but it does
>  
> Here is the text of the RFC:
>  
> To determine a match:
>  
>    1.  Split both the extended language range and the language tag being
>        compared into a list of subtags by dividing on the hyphen (%x2D)
>        character.  Two subtags match if either they are the same when
>        compared case-insensitively or the language range's subtag is the
>        wildcard '*'.
>  
>    2.  Begin with the first subtag in each list.  If the first subtag in
>        the range does not match the first subtag in the tag, the overall
>        match fails.  Otherwise, move to the next subtag in both the
>        range and the tag.
>  
>    3.  While there are more subtags left in the language range's list:
>  
>        A.  If the subtag currently being examined in the range is the
>            wildcard ('*'), move to the next subtag in the range and
>            continue with the loop.
>  
>        B.  Else, if there are no more subtags in the language tag's
>            list, the match fails.
>  
>        C.  Else, if the current subtag in the range's list matches the
>            current subtag in the language tag's list, move to the next
>            subtag in both lists and continue with the loop.
>  
>        D.  Else, if the language tag's subtag is a "singleton" (a single
>            letter or digit, which includes the private-use subtag 'x')
>            the match fails.
>  
>        E.  Else, move to the next subtag in the language tag's list and
>            continue with the loop.
>    4.  When the language range's list has no more subtags, the match
>        succeeds.
>  
> In some sense I am pointing to a problem in step 2, in my code I fix it at line 186, which corresponds to the following variant of step 2:
>  
>  
>    2.  Begin with the first subtag in each list:
>        A.  If the first subtag in the range does not match the first 
>            subtag in the tag, the overall match fails.
>        B.  Else, if the first subtag in the range is '*' and the
>            first subtag in the language is 'x' then move to the
>            next subtag in the range and continue at step 3.
>  
>        C.  Otherwise, move to the next subtag in both the
>            range and the language tag and continue at step 3.
>  
> However I have understood that no one is much interested in this functionality anyway! 
>  
> Jeremy
> Syapse, Inc.
>  
>  

Received on Wednesday, 7 May 2014 19:37:49 UTC