W3C home > Mailing lists > Public > www-validator@w3.org > February 2009

Re: Validator Dev Watch: fuzzy matching for unknown elements/attributes

From: olivier Thereaux <ot@w3.org>
Date: Tue, 17 Feb 2009 09:12:00 -0500
Cc: www-validator Community <www-validator@w3.org>
Message-Id: <440845C2-3F38-451C-B033-F4249E1F2EFC@w3.org>
To: Brian Wilson <bloo@blooberry.com>
Hi Brian, all.

On 16-Feb-09, at 9:35 PM, Brian Wilson wrote:
> I was tossing around the case-sensitivity issue in my head and a way  
> to
> address it. First, I thought why not just do a lc() of the
> element/attribute argument  when doing the fuzzy compare?

I ended up doing that compare *before* doing the fuzzy match. it's  
cheaper processing-wise.

Why need that extra check at all? The edit distance between "ClaSS"  
and "class" is 0 But then so is the distance between "ClaSS" and  
"classid"! I reckon that's a specificity of this algorithm, and I  
don't mind the small workaround.

> Also, on http://search.cpan.org/dist/String-Approx/Approx.pm
> it mentions:
>  "You can ignore case by adding the "i" modifier."
> I didn't grok the rest of what it was saying at first glance, so  
> that may
> not be appropriate...

Yes, I'm getting good results with the "3i" modifier - which means  
that most test cses I threw at it came with the right suggestion  
within 3 case-insensitive edits. Increasing that number will return  
more matches but so far it seems it is not needed.

Testing more will help us figure out if the parameters need tweaking.


Cheers,
-- 
olivier
Received on Tuesday, 17 February 2009 14:12:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:34 GMT