Re: Automatic XML namespaces from Henri Sivonen on 2009-11-05 (public-html@w3.org from November 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 5 Nov 2009 21:26:58 +0200
To: Liam Quin <liam@w3.org>
Cc: HTML WG <public-html@w3.org>
Message-Id: <2D01F758-9C20-4683-8540-F15A249AF1FC@iki.fi>
On Nov 5, 2009, at 14:26, Jirka Kosek wrote:

> There is also full paper from Balisage conference:
>
> http://www.balisage.net/Proceedings/vol3/html/Quin01/BalisageVol3-Quin01.html

Interesting. Quotes from that document below

> August 11 - 14, 2009

Why is it that this reaches public-html only in November?

> Consider a typical XHTML document that also uses XForms, SVG,  
> MathML, and has some metadata using the Dublin Core and FOAF.

Is there even existence proof of one such document let alone such  
documents being in any way typical?

> SVG uses XLink adding another

More precisely, SVG uses XLink attributes but an SVG processor needs  
to implement SVG-specific processing for those attributes and can't  
rely on generic XLink facilities. XLink in SVG isn't exactly a success  
story of vocabulary mixing.

> Recall that namespaces are serving two primary functions: they are  
> associating names with the specifications that define them,

This function is much better served by Google search.

> and they are disambiguating in the case that two specifications  
> define the same name.

In the case of SVG, Namespaces gave SVG an excuse to introduce names  
that HTML had already taken (e.g. <a>).

> In practice, conflicts where the same name is defined by multiple  
> specifications are rare (although still important enough to need  
> addressing).

Are there examples of different XML vocabularies *for Web use* having  
been defined without awareness of each other so that Namespaces have  
fortuitously saved the day when the idea of combining the vocabularies  
has occurred later?

> For XHTML, the DOCTYPE declaration already is sufficient to bestow  
> HTML-ness,

No, it's not. See the following in Gecko or WebKit:
http://hsivonen.iki.fi/test/moz/not-xhtml.xhtml

> A significant motivation driving the use of XHTML is that XML tools  
> can be used with the document, and for these tools, SVG-ness is not  
> associated with any particular element name.

If you put an HTML5 parser at the head of your XML processing  
pipeline, the elements in <svg> subtrees show up in the SVG namespace.  
XHTML isn't needed for this.

> The XML community will not be motivated to support a new  
> specification merely to satisfy the needs of some other community.

XML could have avoided Namespaces if the XML community hadn't  
supported the RDF community:
http://lists.w3.org/Archives/Public/semantic-web/2007Dec/0116.html

But why should the HTML community add Namespaces to text/html merely  
to satisfy another community?

> In the last couple of years, a number of individuals have gathered  
> support for renewed work on non-XML versions of HTML. These are also  
> not based on SGML, but instead are an SGML-inspired format. Avowed  
> dislike of XML appears to have stemmed at least in part from  
> misunderstandings and in part from the stricter and more verbose  
> syntax.

What misunderstandings? People who work on HTML5 seem to be better at  
XML spec lawyering than J. Random XML Advocate. Where there is  
dislike, it's very well informed dislike. It's not like we are XML  
ignoramuses.

> For these people, robustness, accuracy, error detection and  
> correctness are relatively unimportant: all that matters is that the  
> Web browser render an acceptable result.

That's a gross mischaracterization. How can you say that those who  
work on HTML5 consider robustness unimportant when XML makes  
brittleness mandatory and HTML5 goes to great lengths to accurately  
and robustly define error detection and correction?

> At the time of writing, the HTML Working Group is considering hard- 
> wiring MathML and SVG namespaces into the HTML specification, so  
> that an svg element would automatically be placed into the SVG  
> namespace. This would make it harder to process the documents with  
> other tools, for example it's tricky to match SVG elements with  
> XPath or with XSLT match expressions if you don't know in advance  
> whether there will be a namespace declaration, and, if there is,  
> whether it will be correct.

Have you tried using the Validator.nu HTML Parser (http://about.validator.nu/htmlparser/ 
) in place of an XML parser at the head of your XML processing  
pipeline? It even comes with a sample tool called XSLT4HTML5 that  
enables you to use XSLT with text/html.

I also encourage you to examine the DOM resulting from SVG-in-text/ 
html and its XPath-sensitivity in Firefox trunk builds with the pref  
html5.enable set to true. http://hsivonen.iki.fi/test-html5-parsing/

It seems to me that you are solving a problem that has already been  
solved.

> None the less  it is reasonable to be able to expect to generate  
> HTML documents from XML,

The Validator.nu HTML Parser package also comes with a SAX to text/ 
html serializer.

> and also to use JavaScript, XPath and other tools on the HTML DOM.

Already solved.

In Firefox 3.6b1 and in Safari, you can use the same kind of XPath  
expressions and JS on HTML elements in HTML DOMs as you can use on  
XHTML elements in XML DOMs.

> Anywhere that users have to declare a large number of mostly  
> orthogonal (non-overlapping) namespaces is a candidate for  
> improvement: it is particularly unfortunate that users cannot  
> themselves combine namespaces to make new amalgamated ones, such as  
> XSLT plus SVG plus HTML.

Isn't it enough that XSLT can be written as XML? How is the XSLT  
namespace relevant to text/html?

> First, we should note that, as things stand (April, 2009), HTML 5  
> says that certain elements, such as svg and math, are to be placed  
> in the namespaces one might expect automatically. Unfortunately,  
> existing Web browsers do not behave this way. Once HTML 5 becomes a  
> W3C Recommendation one might reasonably expect to see  
> implementations, but a great many people will still be using older  
> browsers. This also presents an incompatibility with XPath used from  
> JavaScript, because older browsers put everything in the default  
> (non) namespace.

This might be worth exploring:
http://www.mail-archive.com/whatwg@lists.whatwg.org/msg18335.html

> The goals of the Automatic Namespace mechanism are to allow document  
> authors to define their own namespace mix-ins in terms of other  
> namespaces and to refer to them, and also to minimise the amount of  
> syntax needed for declarations—in the case of HTML, ideally, to zero.

HTML5 already makes it possible to use HTML, SVG and MathML with zero  
namespace declarations. Demo that works in Firefox trunk builds with  
html5.enable set to true:
http://hsivonen.iki.fi/test/moz/html5-parsing.html

> A Web browser would act as if a default mix-in had been read;

How would this differ from what HTML5 specifies now?

> People reading a draft of this paper commented that a greater  
> barrier to XML adoption in the HTML world was the draconian error- 
> handling, which they believed meant that a Web browser must reject  
> any document that claims to be XML but is not well-formed. This is  
> an unfortunate mis-perception: in fact, the restriction is that the  
> browser must not claim such a resource to be a well-formed XML  
> document, but, once it is not XML it is outside the scope of the XML  
> specification, and error recovery is perfectly acceptable, as long  
> as no claim is made that the original document is itself XML. So it  
> seems to this author that the barrier is not draconian error  
> handling, but browser writers. So, rather                   than  
> address a problem that appears not to exist, the approach here is to  
> address a real difficulty that might be pointed out as a barrier if  
> the draconian error-handling straw-man were to be removed. There is  
> no possibility of making the unfamiliar familiar without  
> acquaintance, but                   first impressions count for a lot.

http://lists.w3.org/Archives/Public/www-tag/2008Dec/0050.html

> The example says that whenever an element called svg is encountered,  
> it introduces a new default namespace, with the given URI, which  
> will apply both to it and to all its children, unless of course they  
> are themselves listed in a namespace file, or unless they have  
> explicit namespace bindings in the document.

What does this complexity accomplish compared to the HTML5 parsing  
algorithm that makes SVG Just Work without namespace mapping files?

> <link rel="ns" href="ns.xml" />
> This markup would go in the HTML head, although it is only needed if  
> your namespace differs from the HTML 5 default, or if you are using  
> XHTML.

A solution that requires the parser to block until it has processed  
another HTTP request isn't going to fly.

I've recently put a lot of effort into making the HTML5 parser in  
Gecko continue processing behind the scenes when legacy semantics  
require it to block. I think new things that would block the parser  
(even so badly that namespace assignments cannot be computed until  
unblocking) are very unwelcome.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 5 November 2009 19:27:37 UTC