Re: XMLLiteral and stripping iquery/ifragment from <base>

Manu Sporny wrote:
> Two unrelated questions about XMLLiterals and stripping iquery/ifragment
> from <base>.
> 
> The first question is whether or not we should include
> xmlns="http://www.w3.org/1999/xhtml" in XMLLiterals for HTML4 and HTML5.
> [...]
> Note that there is no default namespace specified for non-xml mode HTML5
> (AFAIK), so does it make sense to require the namespace in XMLLiterals?

HTML5 requires that pages parsed as text/html always have their elements 
placed in the standard HTML namespace 
(http://whatwg.org/html5#insert-an-html-element) (ignoring SVG/MathML 
for now), regardless of any namespace declarations. HTML content like 
"<html><head>..." is parsed to an identical set of elements (in terms of 
Infoset-style namespace names and local names) as XML like "<html 
xmlns='http://www.w3.org/1999/xhtml'><head>...".

> If we don't include it, do we violate the namespace well-formedness for
> an XMLLiteral? I think we do, but thought I should check to see if I'm
> missing something.

"<sup>...</sup>" is still perfectly legal namespace well-formed XML, 
it's just different to the input (since the element is in no namespace, 
whereas in the input it was in the HTML namespace), and I believe 
XMLLiteral output really should be equivalent to the input (at a 
DOM/Infoset level).

(It ought to be a consequence of the XML serialisation algorithm that 
namespace declarations are added to ensure the namespaces used by 
element/attribute names are correctly declared, regardless of the 
declarations in the input. You have to do that for XML too, because 
you're serialising a fragment that might use namespaces that were 
declared outside the fragment -- it's the same for HTML, except the 
names weren't explicitly declared anywhere in the document.)

> The second question is what constitutes the base URL. If someone were to
> specify the following:
> 
> <base href="http://example.org/foo.xhtml?bar=baz#fnurt></base>
> 
> Would the base URL be: http://example.org/foo.xhtml?bar=baz
> or would it be: http://example.org/foo.xhtml

I would hope it's the same as HTML5's notion of document base URL, most 
recently defined in 
http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#document-base-url 
(it's not in the latest draft since it's meant to be moved to another 
document but seemingly hasn't been yet).

That seems to say it's the <base href> value resolved against the 
document's address, which I think (but I haven't checked this carefully) 
  in this case will be the string 
"http://example.org/foo.xhtml?bar=baz#fnurt". (It will handle encodings 
and normalise some invalid syntax and some other bits, so it's not 
necessarily identical to the input string, but it retains all of the 
components.)

It's possible that HTML5's notion of document base URL could change, 
e.g. I think it could simply drop the fragment part without breaking 
anything, and that might make it more readily reused by RDFa. If so, 
that feedback should be sent to the HTML WG or to whoever's working on 
[WEBADDRESSES] if it's going to be defined there. It seems best if HTML5 
and RDFa can use a common definition, so that RDFa doesn't have to worry 
about redefining details like what happens if there's multiple <base 
href> elements in a document.

(For the base URL used at a specific element, I would also hope it's the 
same as HTML5 uses when resolving URLs 
(http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#resolve-a-url), 
i.e. XML Base plus the document base URL as defined above.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Monday, 21 September 2009 08:40:19 UTC