W3C home > Mailing lists > Public > public-html@w3.org > February 2008

Re: Validation error frequencies

From: Sam Ruby <rubys@us.ibm.com>
Date: Sun, 03 Feb 2008 18:15:16 -0500
Message-ID: <47A64B04.40603@us.ibm.com>
To: Henri Sivonen <hsivonen@iki.fi>
CC: HTML Issue Tracking WG <public-html@w3.org>

Henri Sivonen wrote:
> On Feb 3, 2008, at 15:06, Sam Ruby wrote:
> 
>> At some point we have to wonder what we are trying to accomplish here. 
>> There are lots of gray lines where &lang=en in the query part of a URI 
>> should be non conforming, but a space in a path might not be.
> 
> When authors do very often something that the spec defines as an error, 
> I think we should examine whether it is useful to define it as an error 
> or whether we should make it a conforming cowpath.

Agreed.

> If an error is very common AND the way browsers react to the error is 
> interoperable the way browsers react to the error makes intuitive sense 
> to authors (i.e. the browser behavior is not crazy) AND the error has 
> harmless consequences, I think we should seriously consider making the 
> error into a non-error in order to cut noise from validation results to 
> allow authors to focus on more important matters. (I suspected spaces in 
> IRIs may be an example of this case but I'm not sure.)

The keyword here is 'harmless consequences'.  And actually, the real 
issue is false positives vs false negatives.

> If an error is very common AND the way browsers react to the error is 
> interoperable the way browsers react to the error makes intuitive sense 
> to authors (i.e. the browser behavior is not crazy) AND the error has 
> less harmful consequences than the obvious workaround, I think we should 
> seriously consider making the error into a non-error in order to avoid 
> the more harmful consequences. (I think target='_blank' is an example of 
> this case, and I'm pretty sure.)

A space in a URI has less harmless consequences than %20?

> So what I'm trying to accomplish is making HTML5 validation errors more 
> useful from the author point of view.

So, the question boils down to: which is worse, asking people to replace 
spaces with %20, or not detecting or reporting on real errors? 
Hopefully as I phrased this sentence, you can see my point of view.

Now realize that I can also see your point of view.  If the ratio of 
people using spaces in an interoperable fashion is an order of magnitude 
more than the people who are making real mistakes, the answer is 
probably the same.  But at two orders of magnitude, the answer becomes 
less clear, and and some point, the answer flips.

What your data is showing you and what my data is showing me are two 
fractions of the overall picture.  Alexa top 500 and a self-selecting 
group of authors are two slices.  My focus is on blogs, where a number 
of people evidently use javascript enhanced text input fields, where 
they click on a little globe icon and often copy/paste a URI from other 
sources, sometimes inadvertently grabbing an extra character or two.

> I'd be particularly interested in your opinion on <img border='0'> 
> considering your previous opinions on /> and content models (and usage 
> on Planet Intertwingly).

There are two parts of my answer to that.

For my usage on Planet Intertwingly, I would gladly change that one 
line.  To be brutally honest, the origin of that line is from the 
"classic_fancy" planet theme that I hastily converted from htmltmpl to 
xslt, and never carefully reviewed.

The other part of my answer is that I personally believe all of the 
social engineering in HTML5 that is attempting to steer people away from 
well defined and interoperable behavior and towards CSS, while well 
intentioned, is seriously misguided and counter-productive.

For two reasons.  The first is the shared value that you and I have that 
people tend to view specs through the lens that is a validator.  Specs 
that are written in a way that cause validators to produce messages that 
aren't meaningful to authors tend to be ignored.

The second is more specific.  External style sheets don't syndicate. 
Style attributes are more difficult to sanitize than separate 
attributes, and so many consumers don't bother.

- Sam Ruby
Received on Sunday, 3 February 2008 23:16:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:12 GMT