Versioning and html[5]

I promised I would write up the picture of how we view compatibility at Microsoft.  I would like for you all - particularly those of you who have been in violent disagreement with me on the versioning thread - to read this through beginning to end with an open mind, remembering it's coming from the one guy with (literally) the longest history on the team(s) that have produced and maintained the most widely-used web browser in existence, not just "that evil Microsoft guy". I understand that Microsoft's viewpoint (in that market leader role) is not the only one.  That doesn't, though, mean that it's an invalid one.  I believe the versioning discussion is missing a critical point of browser adoption and evolution of products in place, and ultimately is headed in a direction that will eventually doom this branch of HTML (or at the least, cause yet another unnecessary disruptive convergence at some point in the future).  The crux of this is the experience we (Microsoft) have gained about compatibility and deployment in the real world.

Microsoft has always firmly backed compatibility - in its formats (e.g. Office formats), operating systems (Win32 APIs), and browsers.  On the IE team, we have had the same motto for every release I can remember back through IE4: Don't Break the Web.  (Obviously, I support this proposed tenet for the HTML WG, but it means something different to us.)  We have consistently analyzed the impact of any change we might make, and carefully protected users by not damaging behavior for current content.

In some ways, this has been one of the hardest things for me to deal with as a developer and an innate perfectionist, because it has meant at times that we could not fix behavior that didn't follow the standard, because it would cause layout problems with current content.  In IE6, we got around this inability to improve by adding the "quirks mode<>" switch - at the time, this was a great way to segment content that intended to follow standards.  At the time, "quirks mode" defined the current web - all of the top 200 US sites where in "quirks mode".

In IE7 we used the same switch - that is, we only made most of our changes and additions to correctly implement the standards under "standards mode", and "quirks mode" stayed largely the same.  However, we found we broke a TREMENDOUS number of web pages - many of whose authors, when I personally contacted them (and yes, I did personally contact a lot of them, since I have just that kind of boss) didn't even understand 1) that they'd opted in to standards mode, and 2) that IE was wrong and broken, and not their code.  They blamed IE for changing, regardless of whether it was for the better or not.  In fact, most of these conversations started with my boss seeing a blog post that said "IE ****ed me again."  They were usually conditionally serving that version of the content to IE; if they'd just given us the content they served Firefox, we'd probably have done a better job rendering it.

About a month ago, I tested the top 200 US web sites again.  Nearly HALF of them (49%) were serving their content in "standards mode".  Håkon said<> of this experience, "If that's what they want, the browsers should deliver."  Unfortunately, that's NOT what they wanted or expected, or IE7 would have worked properly with their pages.  No document author intentionally says "yes, please hork my pages."  The authors didn't really know what they were doing, or (more likely) they knew what they were doing, but they were working around IE's bugs expecting "current web behavior" from IE7.  Or, even more likely given my experience examining pages, they were using some library that told them to do that, and the author didn't know what .

Håkon then said<>, in response to the same thread: "Standards are like vaccination -- it hurts a bit, but it's better in the end."  That would seem to imply an expectation we would break the web as it is rendered today in IE.  In fact, L. David Baron came right out and said<>: "We should be willing (when provided good reasons to do so) to define things that would require even the market leader to break compatibility."

Those statements belie the ideal that Anne stated<> when he said "yes" to Henrik's question:<> "Let's say someone wants to create a 100% compliant browser completely from scratch, guided *only* by HTML5 and other standards external to HTML and with no prior knowledge of the functionality of current browsers.  Should this browser then be compatible with 'today's content'?" - if you require the market leader to break compatibility, then you simply are NOT defining the standard of "today's content".  You might be defining a much better standard, of course, but let's be honest.  Today's content uses "CSS scrollbar colors<>".  (yes, I know that's not an HTML example - I figured I would avoid the Pandora's box of the <object> tag and ActiveX for a bit longer.)

As that market leader, I need to say that this answer - this paradox of be compatible, but break the market leader's current behavior - is simply not tenable for us.  We (Microsoft) can't break the web that we support today, even if it is for a great cause like standards compliance and interoperability.  We can't afford, as David said<>, to "compromis[e] on the details where existing implementations disagree"  - not only would we be irresponsible to our dedicated users and developers if we did, I imagine that someone out there would think that we should be held liable for damages for such breakage as well.  That's the world we (Microsoft) live in.  We must remain 100% backward compatible, or we offend both our users and web developers in general.

So, that sucks.  I don't want every other browser (and every browser to come) to have to implement IE's bugs.  (If you want to, it's fine with me; I just don't think it should be a requirement.  Lest you all forget, we did the first round of having to analyzing and figure out how to be compatible, back in the Netscape Navigator days - for every hour Ian's spent puzzling out IE's behavior, I probably spent at least one puzzling out Netscape Navigator 2 and 3 behavior - and if you think OUR behavior is wacky...)  I also don't want to bake IE's poor behavior in some cases into the standard - that would mean EVERYONE should have the hasLayout() side effects, choose not to quote <q>, etc.  (Everyone has bugs, after all, and as Anne said<>, "You don't just finish support for a specification. Specifications are incrementally implemented, not as a whole thing.")  In short, I'm emphatically NOT suggesting what David was talking about when he said "...then we just become the committee to propose new features to IE's developers and then re-document them according to the bugs in IE's implementation," but I AM suggesting that we (Microsoft) can't Break the Web either.  I'm frankly somewhat ambivalent about how much IE-ism we take into the standard, because it won't be exactly IE7's behavior, and therefore can't replace our current behavior when faced with text/html - because we can't Break the Web.

I believe we (the WG) need to build a web we all believe in, starting from where we are, make sure it's downlevel-compatible with the current web (HERE's where the XHTML 2.0 effort screwed up, both in the details and in the mime type issue), and watch the current proprietary things rot away in usage.  However, you need to understand that you can't ask the market leader (and given some of my private conversations, even other UAs) to change the details of what they're shipping today, without causing immense and unnecessary pain to their users and developers.  We must be realistic about the phrase "in the best interest of the web" - it includes not breaking my mom's banking site in IE, even if they are wrong.  I'm happy to evangelize standards (and I do), but not by punishing customers and users.

In fact, even adding features can cause Broken Web problems.  Ask any IE6-era CSS guru, and they will tell you one of their favorite CSS hacks to identify IE was the "child selector hack", also known as the "html>body hack<>".  This hack relied on the fact that IE did not implement the CSS2 "child selector" feature.   Just adding this feature (according to spec!) in IE7 broke pages.  I expect adding <q> support would probably break some expected content, etc.  That's part of why I think we need a version number - so we can know what we're expected to support.  Yes, that's partly UA versioning - I don't think that's completely disassociated.

In short, I can guarantee you that Internet Explorer will not willingly choose to break compatibility with significant amounts of web content (and with ~80% of the browser share, half a billion users, "significant breakage" is an extremely small percent).  All we do when we do that (and IE7 experience bears me out here) is tick off web developers and users.  We can't afford to break what is currently working, so we will have to provide opt-ins for web authors to say "hey, I know what I'm talking about as of now, please give me standards compliant behavior," because we know from experience that all but a very small number of authors expect that in their content.

The reality of it is, when a major browser is released, that is a point singularity on the web; it has the ability to, but cannot be allowed to, cause widespread disruption.  (Actually, I should say - a new release of any browser/UA on the web is a point singularity for all its users and the developers of content they rely on.)  IE7 did cause widespread disruption, as a case in point.  I championed making those widespread changes to improve our standards compliance.  In all seriousness, I've managed to hang on to my job, but sometimes I think only just.  I cannot go to my team and say "hey, we're gonna break the web again (and again and again), but it's okay because it's for a good cause."  The world doesn't work that way.   I wouldn't be responsibly doing my job - that one where half a billion web users rely on my team to not hose compatibility with their banking web site, even if their bank doesn't know how to properly use CSS 'float'.

There are some here who are suspecting that I'm suggesting that other browser should implement the standards, and IE will remain the only browser that implements the "real web" - that is, the web laden with errors due to IE bugs.  For example,  Ian said<>, in response to me saying that we could not change how IE7 renders floating content even though we know it's wrong because it would break the web: "How do the browsers that handle floats more correctly cope with this? Do they misrender half the Web?"

In short, no, because I don't think half the web misuses floating - but let's say 1% of the web does misuse float - that's a significant blocking problem for us, as that is hundreds of thousands of web sites that we've broken.  But it's not for others, for the most part, because typically Web developers today write content to standards, then patch it work in IE under an IE switch - so if we fix our bugs, the patches no longer work and now damage the site, because authors are EXPECTING us to be broken.  But, of course, other UAs are just fine, either way.  So what's a market leader to do?  You're suggesting that the HTML WG gets to pick and choose what the market leader breaks compatibility with - in fact, you're trying to give them (us) no choice.  I note, for example, that the WHATWG HTML5 removes a few key attributes from the <object> element that were in HTML 4.01 - namely, classid and codebase - that are heavily used by ActiveX in IE.  (And y'all seem to get quite upset when we add proprietary features and attributes.)

Unfortunately, the problems with this sort of change are rampant, and frequently (unlike the ActiveX example) they are quite small.  Boris said<>: "...I'm not suggesting you [break sites] gratuitously.  ...I frankly wouldn't expect significant numbers of pages to depend on any particular UA's behavior [in corner cases]."  You're incorrect in your expectation.  Developers (and even more importantly, users, who mostly don't think to care about standards anyway) care about consistency.  Whitespace handling is different in IE today?  Bummer.  Sorry.  We broke pages when we tried to change it in days of yore.  You can still run visicalc.exe<> on Windows Vista.  That says something about compatibility.

I believe our (IE's) only tenable answer (to satisfy the goals of 1) don't break the web, and 2) improve our standards compliance) is that we must require that document authors opt in to standards compliance.  In the short run, this will mean they need to put some switch (probably a comment-based one, somewhat similar to IE conditional comments<> but a document-wide marker in the prolog).  I don't like this, because it requires authors to opt in to standards-compliant behavior, and the default behavior is our current (broken) behavior - but that is exactly what we must keep shipping to not Break The Web.  If we do this, we can remove proprietary features that are replaced by a standard, for example.

But wait, you say, Chris that sucks!  That means the incorrect behavior is the default!  Yeah, I know.  Like I said, I don't like it.  Show me another way to not Break the Web.  But wait, there's another way too - the BEST thing for us, then, (and now I'm talking about Microsoft AND the WG) is to have new DOCTYPE identifiers occur every so often (where "every so often" trends to infinity, as implementations get more and more compliant), because then we can automatically opt that content into whatever our current "most compliant" behavior is.  Despite Anne thinking<>: "standards mode and quirks mode is an unfortunate thing from the past," it is actually quite a great invention, in my opinion; it allowed us to automatically use more standard-compliant behavior without Breaking the Web.  Unfortunately, "standards mode" is too widely adopted now, and we break too much compatibility if we change our UA behavior there, so its time has come.

I think some of you (WG members) are thinking I'm saying we (the WG) can use that as a free license to break backward compatibility with current systems, as if we were the XHTML 2 group.  That is NOT true.  I don't believe in that path of development for HTML.  I just don't think that we can change what "current systems" mean - and if you believe we can, then I would hold up that IE is still the browser for oh, let's call it three quarters of web users worldwide, and was up around 95% a few short years ago - meaning a lot of current web content was authored to IE's behavior - so I believe in the interest of your own claimed tenet (representing the current web), you would have to say that IE defined the standard.  Y'all better get working on that ActiveX support.  (And no, I'm not seriously claiming that IE should or must define the HTML5 standard.  But I do believe IE, far more than anything else, defines the de facto "current web content" specification standard, for better or worse.)

This DOES NOT (in my opinion) give us (the WG) the license to break compatibility with current standards like HTML 4.01 - like XHTML 2 did, or even like XHTML 1.1 - unless that standard does not represent current practice in browsers, in which case we have a decision to make.  In fact, Maciej (in a private mail exchange with me) said: "The problem with versioning is that it encourages breaking of existing content in new specs, so you either have to ignore the spec or build a mode switch."  I understand why you might say that - the temptation might be there - but we'll have to hold the line against that "encouragement".  I don't want to break backward compatibility in the spec; that is separate, to me, from backward compatibility in implementation.

The Microsoft IE team is committed to implementing standards compliance.  At the same time, we can't change current behavior for current content.  That paradox requires us to require authors to opt in - which they can do through IE-specific goo (which, as I said, I think is sub-optimal), or telling us what era of the implementation they wrote their content.  Any implementation route we (IE) choose cannot hose the current web, and that includes matching our current behavior with floating support bugs, <object> implementation, no quotes on <q> and other transgressions.

As for the details of version, I have to object strongly to the idea that the DOCTYPE for HTML5 should just be:

<!DOCTYPE html>

Because I think we would eventually realize we'd broken something, and then we'd re-introduce version numbers, and the progress of HTML would look like this:

HTML 3.2:           <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

HTML 4:               <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">

HTML 5:               <!DOCTYPE html>

HTML 6:               <!DOCTYPE html6>

HTML 7:               <!DOCTYPE html7>...

In short, I'm not that positive that HTML 5 will be the time we get it right for all time.  I would suggest that we use

<!DOCTYPE html5>

And have as our current plan that we will issue point releases of html5 from here on out.  (E.g. the next version -if there ever needed to be one - would be HTML 5.1 or some such rather than HTML6, and we should continue to use the same html5 DOCTYPE.)  If we get it right in HTML5 and never need another version number, you can send me a bill for all the bytes I've caused to be wasted in a few decades.*


*I'm not saying I'll actually pay it.  But you can still send it to me.

Received on Thursday, 12 April 2007 16:40:29 UTC