[whatwg] Issues concerning the <base> element and xml:base

Ian Hickson wrote:
> On Sun, 11 Feb 2007, Geoffrey Sneddon wrote:
>> Safari 2.0.4/419.3: (1) Inserted in DOM (in the innerHTML location).
>> Firefox 2.0.0.1: (3) Inserted in DOM (in the innerHTML location).
>> IE/Mac 5.2.3: (2) (anyway to view the DOM tree?)
>> Opera 9.10: (1) DOM Snapshot for some reason isn't working.
>> IE6/Win: (2) The new <base> never appears in DOM, but the full absolute URLs
>> are in the DOM.
>> IE7/Win: (3) The new <base> never appears in DOM, but the full absolute URLs
>> are in the DOM.
>>
>> In conclusion, Safari and Opera change all the links, IE5/Mac and 
>> IE6/Win both change links within the fragment, and Firefox and IE7/Win 
>> don't change any links.
> 
> The latter is the option I'm following for now. Note that browsers all do 
> _different_ things for target="" than for href="". The spec has made them 
> act the same for now. I'm not sure this is workable, we'll have to see 
> when the browser vendors try to get this interoperable. I can't imagine 
> that it's a huge issue given that the browsers are so far from each other 
> in terms of what they do here. I'm going to do a study of some subset of 
> the Web to see how common this is (at least the static case; I can't 
> really do much about the scripted case).

I don't think this is a good solution actually. In general, I think it's 
good to always make the DOM reflect the behavior of the document. I.e. 
it shouldn't matter how you arrived to a specific DOM, be it through 
parsing of an incoming HTML stream, or by using DOM-Core calls. Whenever 
we make an exception for that rule I think we need to have a good reason 
for it.

For quirky <base> behavior it is my experience that what matters most is 
what URI things in a static page is resolved against. Most modern pages 
that uses scripting and DOM and such usually only has zero or one <base> 
element that lives in the head.

What I suggest is that we make the first or last <base> element in the 
<head> be the one that sets both the base target and the base href for 
the document (modulo all special handling needed when <base>s appear in 
the body, described below). While this is not what IE or Firefox does 
today, I doubt that it'll break enough pages to stray from the 
act-like-the-DOM-looks principal.

Currently mozilla uses the last <base> that appears in <head>. There 
doesn't appear to be a reason for using the last rather than the first, 
it's just what we've always done. However it would be interesting to 
know what IE uses here since it might matter. Did safari or opera run 
into any issues here?

One thing we unfortunately will have to deal with is <base> elements 
appearing in the middle of the body of the document. What mozilla had to 
do was once we find a <base> element in the body of the document, we 
tell the parser to remember the resolved href and/or target of that 
<base> element. We then for any element that uses base uris (full list 
at [1]) set an internal member in the element that hardcodes the 
elements base uri and/or base target.

For elements that don't get this property set on them base href and 
target resolution works as normal. For elements that has this set base 
href and target resolution only uses the set properties.

Note that you only set the saved href and target in the parser if the 
attribute is set in the <base> element. So if a document contains <base 
target="foo"> in the middle of the body that does not set a saved href 
in the parser.

This algorithm is something we had to add to firefox in order to support 
many pages out there. I think IE7 changed how they delt with this, 
though I don't know the specifics of how it changed. Would be 
interesting to get their feedback on this.

[1]http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/content/html/document/src/nsHTMLContentSink.cpp&rev=3.787#799

> On Tue, 10 Apr 2007, Jonas Sicking wrote:
>> Note that the current text isn't implementable since it says that 
>> relative uris in <base> should be resolved against the base uri 
>> document, but the <base> element modifies that base uri so there is a 
>> circular dependency.
> 
> No, the <base> element sets the "document entity's base URI", and is 
> resolved relative to the "base URI from the encapsulating entity" or the 
> "URI used to retrieve the entity". See RFC2396.

Ah, the "base" part of "base URI from the encapsulating entity" confused 
me. Any chance we can remove that or is that the language RFC2396 uses?

/ Jonas

Received on Tuesday, 1 May 2007 17:08:45 UTC