XML and HTML differences (Re: XML namespaces on the Web) from Maciej Stachowiak on 2009-11-17 (public-html@w3.org from November 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 17 Nov 2009 15:40:55 -0800
To: Anne van Kesteren <annevk@opera.com>
Cc: HTML WG <public-html@w3.org>
Message-id: <4B6D3A57-12AF-404D-84AA-7AE9B6E394B9@apple.com>
This is an interesting and insightful thread. If the root of the  
problem is wanting tolerant error handling combined with namespaces,  
then indeed XML5 is an interesting alternative approach. Some members  
of the TAG have expressed the notion that they would like HTML and XML  
to converge (whether completely or just to some extent wasn't clear).  
Adding XML-like namespaces to HTML and adding tolerant parsing to XML  
both seem like they could meet this goal. Just to help frame our  
discussion, here's a list of a few of the most important differences  
between XML and text/html at a high level, as they impact authors:

(1) XML has draconian error handling, while text/html has tolerant  
(and with HTM5 fully specified) error handling.
(2) XML supports arbitrary XML-style namespaces in the syntax, text/ 
html supports only a short list of predefined namespaces.
(3) XML has a fairly strict conforming syntax, while even the  
conforming text/html syntax allows many shortcuts (even setting aside  
the error-tolerance).
(4) XML parsing is completely independent of the vocabulary, text/html  
parsing has many behaviors that are specific to the HTML vocabulary.
(5) XML has only a very small list of predefined entities with  
optional addition of more via DTD processing, text/html has a fairly  
extensive list of named entities.

Are there more important high-level differences that I'm forgetting?

One thing to think about: if we want to bring about a greater  
convergence of XML and HTML, and in particular have HTML-like behavior  
on point (1) but XML-like on point (2), we should think about the  
other differences between XML and HTML to determine which might be a  
better starting point. I know that on points (3), (4) and (5),  
reasonable people may differ on which behavior is better, but perhaps  
they can still help inform our decision-making.

(My opinion: I think the vocabulary-independence of XML is quite  
valuable, and I think it would be a good property for a forward- 
looking serialization.)

Regards,
Maciej


On Nov 16, 2009, at 5:32 AM, Anne van Kesteren wrote:

> I was asked during TPAC to briefly outline a potential alternate  
> approach to making XML namespaces usable on the Web on this mailing  
> list. (Which is what the "distributed extensibility" debate seems to  
> center around.)
>
> The reason people appear to be pushing for some solution of XML  
> namespaces in HTML seem to be:
>
> A) There is a lot of legacy systems out there that would be hard to  
> re-factor to make them ready for XML. They would basically have to  
> be rewritten from the ground up to work with an XML toolchain to  
> make sure the output is always namespace well-formed. Besides that  
> this probably would not happen for cost-benefit reasons it also  
> makes writing a simple tool that outputs content a lot more  
> complicated. No more PHP echo or Python print to show something on  
> the screen, but rather you would have to use some kind of DOM, a  
> serializer, etc.
>
> B) Internet Explorer does not support XHTML.
>
> If the problem is just B I'm not sure it is worth introducing  
> complexity in HTML to work around a bug in a browser. Generally we  
> do not introduce new features to work around bugs in browsers.
>
> If the problem is A it seems to me it would be better to solve that  
> problem at its core: XML. I worked on that while ago (two years or  
> so) and put some experimental code and documentation online here:
>
>  http://code.google.com/p/xml5/
>
> It tries to preserve the existing characteristics of XML in browsers  
> by not doing anything with the DTD and by being stream-able. It is  
> also backwards compatible with XML 1.0 and I think XML 1.1 in the  
> sense that any namespace well-formed XML 1.x document will result in  
> the same tree when using an XML5 parser. The main new feature is  
> that it also defines what the resulting tree will be for byte  
> streams that are not namespace well-formed.
>
> The idea is that "XML5" would replace XML 1.x so that we do not end  
> up with yet another dialect. This and most of the above is quite  
> controversial and since I'm personally still not quite sure what  
> problem XML namespaces is solving (they appear to have been added  
> mostly for RDF) I have never really pursued this idea much further.  
> However, it was brought up again so I thought I should outline the  
> thought process.
>
>
> (Another reason I played with XML5 is that in mobile walled gardens  
> one can often find non-namespace well-formed XML that is expected to  
> processed anyway because less compliant user agents that came before  
> us (see also http://simon.html5.org/articles/mobile-results )  
> processed it too.)
>
>
> -- 
> Anne van Kesteren
> http://annevankesteren.nl/
>
Received on Tuesday, 17 November 2009 23:41:29 UTC