[whatwg] Issues concerning the <base> element and xml:base

Ian Hickson wrote:
> On Tue, 1 May 2007, Jonas Sicking wrote:
>>> The latter is the option I'm following for now. Note that browsers all 
>>> do _different_ things for target="" than for href="". The spec has 
>>> made them act the same for now. I'm not sure this is workable, we'll 
>>> have to see when the browser vendors try to get this interoperable. I 
>>> can't imagine that it's a huge issue given that the browsers are so 
>>> far from each other in terms of what they do here. I'm going to do a 
>>> study of some subset of the Web to see how common this is (at least 
>>> the static case; I can't really do much about the scripted case).
>> I don't think this is a good solution actually. In general, I think it's 
>> good to always make the DOM reflect the behavior of the document. I.e. 
>> it shouldn't matter how you arrived to a specific DOM, be it through 
>> parsing of an incoming HTML stream, or by using DOM-Core calls. Whenever 
>> we make an exception for that rule I think we need to have a good reason 
>> for it.
> I think you misread what I wrote. Right now, there's no magic involved 
> here.

When you said "the latter is the option I'm following for now" I thought 
you referred to "and Firefox and IE7/Win don't change any links". Is 
that not the case?

Looking at the spec it doesn't mention anything special regarding DOM 
mutations at all, so that would indeed make me think that links are 
changed if a <base> element is inserted at the top of the <head> using 
the DOM.

>> What I suggest is that we make the first or last <base> element in the 
>> <head> be the one that sets both the base target and the base href for 
>> the document (modulo all special handling needed when <base>s appear in 
>> the body, described below). While this is not what IE or Firefox does 
>> today, I doubt that it'll break enough pages to stray from the 
>> act-like-the-DOM-looks principal.
> Right now the href="" is from the first and the target="" is from the 
> last, but other than that that's what the spec says.

Why is the fact that the last target is the one used only defined in a 
Note? Or am I missing it somewhere else?

Also, if we're going to be inconsistent in how current browsers and web 
pages handle multiple <base>s, why not simply use the first <base> for 
both href="" and target=""?

>> One thing we unfortunately will have to deal with is <base> elements 
>> appearing in the middle of the body of the document. What mozilla had to 
>> do was once we find a <base> element in the body of the document, we 
>> tell the parser to remember the resolved href and/or target of that 
>> <base> element. We then for any element that uses base uris (full list 
>> at [1]) set an internal member in the element that hardcodes the 
>> elements base uri and/or base target.
>> For elements that don't get this property set on them base href and 
>> target resolution works as normal. For elements that has this set base 
>> href and target resolution only uses the set properties.
>> Note that you only set the saved href and target in the parser if the 
>> attribute is set in the <base> element. So if a document contains <base 
>> target="foo"> in the middle of the body that does not set a saved href 
>> in the parser.
> This is deep magic, as far as the DOM goes. It also makes it hard to debug 
> -- e.g. dynamically modifiying <base> elements, moving them, etc, has no 
> effect anymore.

Yup, I agree that this is deep magic as far as a DOM user goes.

> HOWEVER, having said that, this is a tiny minority of pages. According to 
> a study I did of over 100,000,000 pages, 0.036% of pages have more than 
> one <base href=""> element (ignoring those that specify the same href="" 
> value more than once).
> With <base href="">, you can get 404s, but in practice IE7 is already 
> doing that, and it doesn't seem to have affected adoption. Anecdotely, 
> most of these pages use absolute URIs, which might explain it.

It's much easier for IE to get away with breaking pages, mostly because 
many people use IE as the yard-stick.

> 0.06% of pages have more than one <base target=""> element (again ignoring 
> duplicates). With <base target="">, the worst that can happen from the 
> user's point of view is that links will open in a new page instead of on 
> the same page, and in practice even that's not likely, since (anecdotely) 
> most pages with <base target=""> simply alternate between different names.
> What do you think?

I would be hesitant to drop support for multiple <base>s in firefox 
actually. Implementation wise it was very easy to implement, and it is 
known that many pages out there break, though the percentage is small, 
there are a lot of pages on the internet.

It might be something we could restrict to quirks mode pages though, 
that's not a bad idea at all.

/ Jonas

Received on Wednesday, 30 May 2007 16:44:57 UTC