Re: [whatwg] Wasn't there going to be a strict spec? from Erik Reppen on 2012-08-10 (public-whatwg-archive@w3.org from August 2012)

From: Erik Reppen <erik.reppen@gmail.com>
Date: Fri, 10 Aug 2012 17:29:48 -0500
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: whatwg@lists.whatwg.org
Message-ID: <CALtP25_jUrpCfqjCFw6NU-CW+x4nGOGaNvvZOWskjjaaJpiYyA@mail.gmail.com>
This confuses me. Why does it matter that other documents wouldn't work if
you changed the parsing rules they were defined with to stricter versions?
As far as backwards compatibility, if a strict-defined set of HTML would
also work in a less strict context, what could it possibly matter? It's
only the author's problem to maintain (or switch to a more forgiving mode)
and backwards compatibility isn't broken if the same client 500 years from
now uses the same general HTML mode for both.

I think there's a legit need for a version or some kind of mode for HTML5
that assumes you're a pro and breaks visibly or throws an error when you've
done something wrong. Back in the day nobody ever forced authors who didn't
know what they're doing to use doctypes they were too sloppy to handle. I
wasn't aware of any plan to discontinue non-XHTML doctypes. How everybody
started thinking of it as a battle for one doctype to rule them all makes
no sense to me but I'm fine with one doctype. I just want something that
works in regular HTML5 but that will break in some kind of a strict mode
when XML-formatting rules aren't adhered to. You pick degrees of strictness
based on what works for you. I don't really see a dealbreaking issue here.
Why can't we all have it the way we want it?

As somebody who deals with some pretty complex UI where the HTML and CSS
are concerned it's a  problem when things in the rendering context give no
indication of breakage, while in the DOM they are in fact getting tripped
up. Sure, I can validate and swap out doctypes or just keep running stuff
in IE8 to see if it breaks until I actually start using HTML5-only tags but
this is kind of awkward and suggests something forward-thinking design
could address don't you think?

On Fri, Aug 10, 2012 at 3:05 PM, Tab Atkins Jr. <jackalmage@gmail.com>wrote:

> On Fri, Aug 10, 2012 at 12:45 PM, Erik Reppen <erik.reppen@gmail.com>
> wrote:
> > My understanding of the general philosophy of HTML5 on the matter of
> > malformed HTML is that it's better to define specific rules concerning
> > breakage rather than overly strict rules about how to do it right in the
> > first place but this is really starting to create pain-points in
> > development.
> >
> > Modern browsers are so good at hiding breakage in rendering now that I
> > sometimes run into things that are just nuking the DOM-node structure on
> > the JS-side of things while everything looks hunky-dorey in rendering and
> > no errors are being thrown.
> >
> > It's like the HTML equivalent of wrapping every function in an empty
> > try/catch statement. For the last year or so I've started using IE8 as my
> > HTML canary when I run into weird problems and I'm not the only dev I've
> > heard of doing this. But what happens when we're no longer supporting IE8
> > and using tags that it doesn't recognize?
> >
> > Why can't we set stricter rules that cause rendering to cease or at
> least a
> > non-interpreter-halting error to be thrown by browsers when the HTML is
> > broken from a nesting/XML-strict-tag-closing perspective if we want?
> Until
> > most of the vendors started lumping XHTML Strict 1.0 into a general
> > "standards" mode that basically worked the same for any declared
> doctype, I
> > thought it was an excellent feature from a development perspective to
> just
> > let bad XML syntax break the page.
> >
> > And if we were able to set such rules, wouldn't it be less work to parse?
> > How difficult would it be to add some sort of opt-in strict mode for
> HTML5
> > that didn't require juggling of doctypes (since that seems to be what the
> > vendors want)?
>
> The parsing rules of HTML aren't set to accommodate old browsers,
> they're set to accommodate old content (which was written for those
> old browsers).  There is an *enormous* corpus of content on the web
> which is officially "invalid" according to various strict definitions,
> and would thus not be displayable in your browser.
>
> As well, experience shows that this isn't an accident, or just due to
> "bad authors".  If you analyze XML sent as text/html on the web,
> something like 95% of it is invalid XML, for lots of different
> reasons.  Even when authors *know* they're using something that's
> supposed to be strict, they screw it up.  Luckily, we ignore the fact
> that it's XML and use good parsing rules to usually extract what the
> author meant.
>
> There are several efforts ongoing to extend this kind of non-strict
> parsing to XML itself, such as the XML-ER (error recovery) Community
> Group in the W3C.  XML failed on the web in part because of its
> strictness - it's very non-trivial to ensure that your page is always
> valid when you're lumping in arbitrary user content as well.
>
> Simplifying the parser to be stricter would not have any significant
> impact on performance.  The vast majority of pages usually pass down
> the fast common path anyway, and most of the "fixes" are very simple
> and fast to apply as well.  Additionally, doing something naive like
> saying "just use strict XML parsing" is actually *worse* - XML all by
> itself is relatively simple, but the addition of namespaces actually
> makes it *slower* to parse than HTML.
>
> ~TJ
>
Received on Friday, 10 August 2012 22:30:17 UTC