Re: Validator Dev Watch: fuzzy matching for unknown elements/attributes

Hi Brian, all.

On 16-Feb-09, at 9:35 PM, Brian Wilson wrote:
> I was tossing around the case-sensitivity issue in my head and a way  
> to
> address it. First, I thought why not just do a lc() of the
> element/attribute argument  when doing the fuzzy compare?

I ended up doing that compare *before* doing the fuzzy match. it's  
cheaper processing-wise.

Why need that extra check at all? The edit distance between "ClaSS"  
and "class" is 0… But then so is the distance between "ClaSS" and  
"classid"! I reckon that's a specificity of this algorithm, and I  
don't mind the small workaround.

> Also, on http://search.cpan.org/dist/String-Approx/Approx.pm
> it mentions:
>  "You can ignore case by adding the "i" modifier."
> I didn't grok the rest of what it was saying at first glance, so  
> that may
> not be appropriate...

Yes, I'm getting good results with the "3i" modifier - which means  
that most test cses I threw at it came with the right suggestion  
within 3 case-insensitive edits. Increasing that number will return  
more matches but so far it seems it is not needed.

Testing more will help us figure out if the parameters need tweaking.


Cheers,
-- 
olivier

Received on Tuesday, 17 February 2009 14:12:09 UTC