Re: Rule breaking on the Web

On 2/21/08, Ian Hickson <ian@hixie.ch> wrote:
> On Thu, 21 Feb 2008, Mark Baker wrote:
>  >
>  > By not following specs, they're not playing by the same rules that the
>  > rest of the world has agreed to play by.
>
>
> Actually, by following the specs, and by pushing standards compliance,
>  _we_ are not playing by the rules that most of the world is playing by.

I think you greatly underestimate the amount of agreement that it has
taken to get us where we are today, Ian.  Most of the world agrees on,
oh, say, 90% of what we've got in the HTTP, URI, and HTML/CSS/DOM/JS
specifications ... with most of that 10% being contributed by the
latter set of specs of course.

> By and large, it's not a minority.  Obviously you need to find data for
>  each case, and I don't know what the relevant stats are for this group and
>  this discussion, but e.g. HTML is syntacticaly invalid 70% to 95% of the
>  time depending on how strict you are about what is an error. That's a
>  majority of pages that are syntactically invalid even when one tries to be
>  as loose with the spec requirements as possible (e.g. ignoring missing
>  DOCTYPEs), and it doesn't even look at things like MIME types,
>  semantically-correct use, attributes values, actually following the
>  element content model rules, etc. (Source: an unpublished study of several
>  billion pages during the summer of last year. Other studies by independent
>  researchers on smaller samples have found similar results.)

As you know, I totally agree about HTML, and I really appreciate the
work you and the other WHATers have done with the new parsing
algorithm.

>
>
>  > You educate the minority so that they understand the problems they've
>  > created for themselves, and appreciate the value in fixing their
>  > mistakes.
>
>
> Education hasn't worked so far; why do we think it should work in the
>  future?

I don't think either of us have any numbers to back our position, but
I've personally notified several server admins of misconfigurations,
and in all but one case they were happy to fix the problem.

>  > Otherwise, over the long term, entropy would win and eventually kill
>  > interoperability, or at least greatly increase the barrier to entry for
>  > new players.
>
>
> That's why we have to define the rules for implementations even in the
>  face of broken markup -- it allows interoperability to continue regardless
>  of author-side conformance, and it reduces the barrier to entry for new
>  players by dramatically reducing the amount of reverse engineering
>  required.

I agree for HTML, but note that you needn't "define the rules for
implementations", you need only specify the meaning of an HTML
document.  That's why I'm such a big fan of the parser because it's
all about the latter, not the former.

Mark.
-- 
Mark Baker.  Ottawa, Ontario, CANADA.         http://www.markbaker.ca
Coactus; Web-inspired integration strategies  http://www.coactus.com

Received on Monday, 25 February 2008 15:16:27 UTC