- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: Fri, 4 Dec 2009 19:22:08 +0200
- To: www-validator@w3.org
- Cc: Olivier Thereaux <ot@artbeat.me>
On Thursday 03 December 2009, Olivier Thereaux wrote: > Hi Ville, hi everyone, Hi Olivier, nice to hear from you, > One thing that surprised me however, was the line stating: > [[ Removed feature: the "fuzzy matching" feature introduced in 0.8.5 > has been removed because it produced too many confusing and invalid > suggestions. ]] > This sounds like a case of throwing the baby with the bathwater. Is > there any way we could work together to help fix/improve the feature? I think it'll require quite a bit of work, and I can't personally promise to be available for that. If someone wants to spend time doing the initial fixes and be available for maintaining the feature, I have no problem with that. But I'm not at all convinced that this is feasible. What needs to be done is that validator needs to have knowledge of the possible, _valid_ choices it suggests for each susceptible misspelling, I don't think it can ever work well enough if it uses simple flat lists of a bunch of tag and attribute names that can be valid when they occur somewhere in a document, which is how the feature was implemented. For example, consider <p HREF="foo">foo</p> in a XHTML document (easy to test with direct fragment input with XHTML 1.0 Strict). Validator suggests 'Did you mean "href"', which is just as much bogus as the original. If it doesn't know exactly what attributes are valid for <p> in XHTML 1.0 Transitional but just looks up the closest match for HREF from its flat list, it will always continue to give bad suggestions. Similarly, <foo/> alone again using direct fragment input with XHTML 1.0 results in 'Did you mean "tfoot" or "form"'. The tfoot suggestion is bogus as it can't occur outside of a table, but as validator again uses a flat list of tag names that are valid somewhere without any context, it just suggests it. (The "form" suggestion here is a better one, but that's just lucky.) Similarly, error message for <objetc> in a HTML 3.2 document is 'element "OBJETC" undefined. Did you mean "object"?', but the object element can not occur anywhere in an HTML 3.2 document. I think the only way to fix this properly is to make validator know the valid possibilities at each position of a document where it is about to make the suggestions, and use only those. It doesn't necessarily need to know /all/ possible valid alternatives for every position, but the ones it ends up suggesting must be valid. It should also take other things into account, for example given <a href="..." ref="..."> it should not include "href" in its fix suggestions for "ref", because the href attribute was already specified. Ditto for <table><the/><thead/></table> it shouldn't suggest "thead" for "the" because thead is already there (later), and there can be only one in a table. And some kind of distance thresholds (where semantic similarity would preferably be taken into account, not just raw string distances) for suggestions should be applied as well so that validator doesn't suggest something completely different from what was written in a document, gems like for example this one simply cannot happen if you ask me (even if the suggestions were valid, which they obviously aren't in this case): <html xmlns="http://www.w3.org/1999/xhtml"> in an HTML 3.2 document: 'Attribute "XMLNS" is not a valid attribute. Did you mean "onmouseup" or "onmouseover"?' So we'd need these lists - one for each doctype for which this feature is supported for - and quite probably some kind of code changes at least for element name suggestions so there's enough context to look up the valid alternatives from. I have a feeling that I'm missing even some more things that would need to be done for the feature to work acceptably, but these are already enough for me personally to consider trying to do it not worth my time in the foreseeable future. > Were there other bugs reported? Yes, there have been more than a few posts on this list about it, and IIRC some reports in Bugzilla too. > Even if the feature, as it is, may not be perfect, I strongly believe > that removing it goes strikingly against the effort made in the past > years to make the validator more usable by newcomers to HTML (more > suggestion, more help, fewer harsh messages) and it would hurt to remove > it without trying to improve it, or replace it. I've tried (and managed to fix a few cases), but those are just scratches in the surface of a bigger pile of problems in my opinion. My strong opinion is that if there's a possibility for the feature to give an incorrect suggestion, its net effect is worse than if the feature did not exist at all, especially for newcomers. And because of that and that I seem to be the only one working on the validator these days, my call was to remove the feature and I think it's the right one until patches that fix the fundamental issues prove me wrong :). In any case, I also think that validator 0.8.6 should be released (soon) without this feature; it can be always brought back later if someone gets it to work.
Received on Friday, 4 December 2009 17:22:45 UTC