Hoisting <base> into <head>

HTML5 says that only the first <base href> that is a child of <head> is taken into account for overriding the base URL and the first <base target> that is the child of <head> is taken into account for the default browsing context target. HTML5 also doesn't make the parser hoist <base> into <head>.

Both these behaviors have been implemented in Gecko in the Firefox 4.0 product cycle. The combination of these two features is causing Web compatibility problems. Having only one of them wouldn't. Before the HTML5 parser was enabled in Gecko, we had a situation where Gecko only consider <base> in <head> but the parser hoisted misplaced <base> into <head>. Currently, Chromium nightlies don't hoist <base> but consider <base> outside <head>.

Specifically, I've seen three bugs.
1) https://bugzilla.mozilla.org/show_bug.cgi?id=571389

The tdcanadatrust.com bank site broke, because <basefont> implied <body> and <base href> occurred after <basefont>. This was fixed by adjusting the parsing algorithm to keep <basefont> in <head>. This case doesn't call for further spec changes.

2) https://bugzilla.mozilla.org/show_bug.cgi?id=592880

Hyperlatex generates a frameset with a content frame and a table of contents frame. The table of contents puts <base target> after an explicit <body> tag. Thus, the links in the ToC are mistargeted in Firefox 4 betas. Hyperlatex also uses an Almost Standards doctype, which means IE8 uses the IE8 Almost Standards mode and IE7 uses the IE7 Standards Mode. This means the links are also mistargeted in IE7 and IE8. Since the output of Hyperlatex is already broken in IE8, I treated this as an evangelism bug. Clicking the Compatibility View button in IE8 doesn't make the links work as intended, since the links were mistargeted already in IE7.

However, Hyperlatex seems to be an effectively unmaintained piece of software, Hyperlatex-generated output appears on many servers around the Web and Hyperlatex is available in e.g. Ubuntu repositories so it will probably find users who generate even more content. It is not really useful to break that content in Firefox 4 (or IE9!) even if it is broken in IE7 and IE8.

3) https://bugzilla.mozilla.org/show_bug.cgi?id=593807

The United Airlines online check-in process breaks for non-U.S. citizens, because a page has the tag <c:set var="static_c2c" value="true" scope="request"/> before <base>. In IE8, united.com is on the Microsoft maintained Compatibility Mode list. Also, the problematic page is doctypeless, so it gets processed in the IE 5.5 mode in IE8. However, even if the page were in the IE8 Standards Mode, the <base> would still take effect, because <c:set var="static_c2c" value="true" scope="request"/> doesn't imply </head><body> in IE8. In IE9 mode in IE9 PP4, <c:set var="static_c2c" value="true" scope="request"/> implies </head><body> as in the HTML5 parsing algorithm and a subsequent <base> doesn't take effect.

- -

So far, it has taken me far more time to investigate these issues than it would have taken to make the parser hoist <base> into <head>. Moreover, when writing to hyperlatex-users the best reason I can give them why Firefox 4 is breaking their content is that IE8 is already breaking it. That's not a great reason. When writing to United, the best reason I can give is that Firefox became more standards-compliant. That's an even worse reason that saying that IE broke first.

Is there any good reason why we shouldn't change the spec to say that in the "in body" insertion mode a start tag token whose tag name is "base" must be processed according to the rules of the "after head" insertion mode? If there's no good reason, can the change please be made?

I believe there's no need to support script-inserted <base> elements outside <head>, and I believe there's no need to support multiple <base> elements. That's why preferred change would be to the parsing algorithm and not to the other part of the spec. Also note that a prior analysis of what would break if the IE7/IE8 Standards Mode behavior were adopted doesn't reflect the actual breakage if the analysis didn't consider which pages go into the IE 5.5 mode in IE and if the analysis didn't consider <foo:bar /> not breaking out of <head> in IE7 and IE8. Also, an analysis of Google-indexed content wouldn't have considered login-walled content like airlines and banks.

Henri Sivonen

Received on Tuesday, 7 September 2010 11:16:39 UTC