A Concrete Example for the HTML Versioning Debate

This is the third time I've attempted to get into this thread and as
usual I get caught up in lengthy exposition.  My apologies up front.
I'd like to try and bring up a concrete example to investigate the
"HTML Versioning" debate.  Consider the following two HTML documents:

inner.html:
<html><body>
  <a href="http://www.google.com/" target="_parent">Click Here</a>
</body></html>

outer.html:
<html><body>
  <p>Paragraph 1</p>
  <object type="text/html" data="inner.html" width="250" height="250"></object>
  <p>Paragraph 2</p>
</body></html>

Behavior today has been implemented differently across various
browsers, largely in part to the HTML4 spec being unclear about what a
"frame" means [1] [2].  In the above, clicking the link causes Google
to show up in the full browser window in Moz and Opera, but IE puts
Google only in the 250x250 object region.

To clear up the target ambiguity, the WHATWG HTML5 spec wisely
introduces the term "browsing context" [3], stating that the target
attribute specifies a valid browsing context [4] and states that an
<object> creates a new browsing context [5].  This is great because
the behavior is finally concisely stated and (to me) is a less
confusing approach than introducing a "_replace" value like WebCGM and
SVG did [6].

If the relevant behavior outlined in the WHATWG HTML5 spec becomes
part of the W3C HTML5 spec, it is clear that Microsoft should update
their behavior to match Opera and Mozilla.

Now I know that my example is probably pretty contrived, but at least
in this instance, it seems there is no way, as Ian says in [7] to
"absolutely ensure that HTML5 is compatible with all today's content"
because different browsers did different things and "today's content"
may have relied on one browser or the other browser's behavior.  Ian,
do you agree?  Or do we not care about web pages that relied on one
browser's specific behavior?

So ... if I'm an author of a web page that depends on the deployed
behavior of IE7- today and IE8+ changes its behavior to match HTML5 by
default (no opt-in), then my web page is broken.  Even if the IE7-
behavior was wrong (a bug), HTML5 still broke my web, right?

If I understand Chris Wilson correctly, without a means for the author
to opt-in to the new behavior, Microsoft would not fix their behavior
here any time soon.  (I do understand that this specific example might
be one of those corner cases where less than 0.5% of the pages are
going to snap, but I hope I am at least illustrating that such cases
COULD happen).

One possible solution discussed was to keep the opt-in at the browser
level (i.e. use IE conditional comments for IE bugs/quirks).  I
disagree with this, primarily because it was the specification that
was updated - this caused the browser change.  Furthermore, I don't
think other browsers have as reliable opt-in mechanisms as IE's
conditional comments (spoofed UA strings, relying on the existance of
certain DOM attributes, etc).

If we're going to have a specific HTML5 and then HTML6 spec, why is
there such a big push to make HTML5+ "versionless" from a user agent
point of view?  Is it really just because of the non-archaic DOCTYPE?
That seems kind of silly - you'll never truly eliminate "cargo cult
boilerplate" because of the varying levels of understanding out there
amongst the wide spectrum of web developers.  People will still copy &
paste "<!DOCTYPE html>" from one web page to another... people are
lazy.

Also, I really don't think that in 50 years time we can say that all
the choices we make today in HTML5 will be the right ones - when we
discover a particularly troublesome browser inconsistency fifteen
years from now will we really be retroactively clarifying the behavior
in the HTML5 spec?  Will this really still be HTML 5.0 and not 5.01?
And then what?  Some browsers will have to fix their implementations
and web pages will have to fix their web pages to avoid breaking.  I
know your goal is to specify everything clearly, but I'm sure that was
the goal with HTML4 too (or at least, they did not intentionally
introduce ambiguities).

I do understand the WHATWG's noble goal here (to preserve the
"history" of the web for future generations) but I just don't think it
is possible to preserve things perfectly forever.

These, I think, are the three points I would like everyone to consider:

  A) NON-TRIVIAL SOFTWARE IS NEVER PERFECT

This means that, as a WEB AUTHOR, I will always have to work around
browser-specific bugs.  Those bugs have prominence in a temporal
window and eventually fade away as browsers improve (and introduce new
bugs!)  If I don't update my web page at some point, I do risk losing
the perfectness of my web page rendering - regardless of any
specification.  This is a risk that the web author should understand
because that is the way user agents are deployed (multiple platforms,
multiple implementations, multiple concurrent versions).

  B) NON-TRIVIAL SPECIFICATIONS ARE NEVER PERFECT

This means that, in some instances, as a BROWSER DEVELOPER, I have to
make implementation decisions based on my understanding.  This has the
risk of leading to browser inconsistencies.   When you mix
technologies (HTML + DOM + CSS + JavaScript), this whole effect is
exponentially magnified.

  C) PEOPLE WILL FIND WAYS OF USING A TECHNOLOGY IN WAYS YOU DIDN'T THINK OF

HTML was never intended as a platform for web applications.
JavaScript was never intended to do any heavy lifting.  The fact that
WHATWG's spec is not specifically called "HTML5" but "Web Applications
1.0" is a direct recognition that there is a need to evolve the HTML
spec into something more.  You don't know how future developer
experiments will affect the landscape of the web.

Does anyone disagree with A), B) or C)?

Anyway, what am I getting at?  Well, to be honest, when I started this
I didn't have any solid opinion and I'm not sure if this exposition
has given me one.  Let's see.

- I believe that user agents and web authors need to be able to
specify whether content is HTML4 or HTML5+.  Whether this is done by
DOCTYPE (I think that's reasonable, since a 4.01 one exists) or <html
version="5.01"> I don't know if I care much.  But I do think that "5"
should be in there somewhere, because going from 4 to nothing to 6
will look silly and will probably take up a large portion of the
"History" chapter in a future "HTML SuperBible".  Yes, I know - these
are serious reasons ;)

- I believe that web authors will always be required to code some
things to the "current" set of popular browsers and that this window
is constantly moving.  I don't believe in a utopia of pure perfect
standards-based markup, script and stylesheets that will be rendered
perfectly forever.

- As a result, I don't believe that future archaeologists 100 years
from now will be able to perfectly render more than 50% of the web
pages they find online from 1997-2007.  However, note that I said
"perfectly".  I do believe that they will get it "close enough" to be
able to extract almost any piece of relevant information they require
from what web pages are out there.  After all, all these technologies
are plain text so looking at the source is always an option to get at
the content.  Furthermore, platforms are usually recycled in the form
of hobbyist emulators (think C64 emulators, DOS emulators, MAME) so I
bet that ancient user agents will have a surprising way of "staying
alive" much much longer than one might expect.  See [8].

- In fact, I also believe that less than 20% of the web pages online
right now will even be on the network 100 years from now.  How many
web pages have been completely lost because people just didn't care
enough to move their 1998 "Beavis & Butthead Audio Clips" web page
from one ISP to another ?  "Freezing" a specification will not at all
prevent this type of tragic loss.  Personally, I believe that what's
important is that enough stuff survives from the 1990s to reflect what
the web/society was like at that time - but not all of it need survive
(maybe this sounds elitist, but the portion that survives is really
author-selected, I have nothing to do with this).

Regards,
Jeff Schiller

[1] http://www.w3.org/TR/REC-html40/present/frames.html#adef-target
[2] http://www.w3.org/TR/REC-html40/struct/objects.html#h-13.5
[3] http://www.whatwg.org/specs/web-apps/current-work/#browsing0
[4] http://www.whatwg.org/specs/web-apps/current-work/#target3
[5] http://www.whatwg.org/specs/web-apps/current-work/#the-object
[6] http://lists.w3.org/Archives/Public/www-svg/2007Apr/0002.html
[7] http://lists.w3.org/Archives/Public/public-html/2007Apr/0319.html
[8] http://www.dejavu.org/emulator.htm

Received on Wednesday, 18 April 2007 04:22:01 UTC