W3C home > Mailing lists > Public > www-tag@w3.org > January 2009

HTML and XML (was: Re: Comments on HTML WG face to face meetings in France Oct 08)

From: Anne van Kesteren <annevk@opera.com>
Date: Sat, 24 Jan 2009 13:49:07 +0100
To: "David Orchard" <orchard@pacificspirit.com>, "Henri Sivonen" <hsivonen@iki.fi>
Cc: www-tag@w3.org
Message-ID: <op.un9hv5dc64w2qv@annevk-t60.oslo.opera.com>

Let me try again, because the examples were not that good.

On Sat, 24 Jan 2009 12:07:23 +0100, Anne van Kesteren <annevk@opera.com>  
wrote:
> I suppose I should present proof for this though. Since I cannot think  
> of a good way to put it, lets go through some examples.

I realized later that most of my examples were not well-formed (per XML)  
and might therefore not be convincing enough. On the other hand, when made  
well-formed they do pretty much the same:

   Stream:
   <body><div><button/></div>X</body>

   Tree:
   html
    head
    body
     div
      button
       "X"

   Stream:
   <body><image>X</image></body>

   Tree:
   html
    head
    body
     img
     "X"

   Stream:
   <test><x/>X</test>

   Tree:
   html
    head
    body
     test
      x
       "X"

   Stream:
   <br></br>

   Tree:
   html
    head
    body
     br
     br

And for each of these perhaps dubious behaviors there are pages out there  
depending on this parsing the way it does.

So even if we get some kind of namespacing in HTML that is similar to XML  
it will always have to be very constrained in order to not break legacy  
pages. Especially the assumptions they make about how HTML parsers behave.

I think that if you want to allow arbitrary tree-based markup languages  
your only option is using XML. If you want them to be usable by authors as  
well you need something like XML5, because even the experts fail:

   http://diveintomark.org/archives/2004/01/14/thought_experiment
   http://diveintomark.org/archives/2008/03/09/no-fury-like-dracon-scorned
   http://annevankesteren.nl/2009/01/xml-sunday

At the end of the day, XML is too hard and HTML gives little freedom.  
Creating a superset of HTML that provides the same freedom as XML while  
remaining backwards compatible is technically impossible I think. Creating  
a superset of XML that is more lenient while remaining backwards  
compatible with XML 1.0 and XML 1.1 is technically doable, as my XML5  
project demonstrates.


> You can try this out for yourself here:
>
>    http://livedom.validator.nu/
>    http://james.html5.org/parsetree.html


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
Received on Saturday, 24 January 2009 12:50:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:11 GMT