Re: Design goal regarding HTML5

On 06/09/2012 12:58, Stephen D Green wrote:
> Then 1) MicroXML could allow parsers to preserve in its data model
> whether the markup included an empty element as <abc/> or as
> <abc></abc> or 2) maybe HTML would start to treat them as
> equivalent. Either way, here is a potential road map to sensible
> convergence, isn't it, with MicroXML setting out to make the first
> step from the XML side and to highlight potential reciprocal changes
> that might be made from the HTML side.


I don't think that will happen. The HTML5 designers explicitly rejected
(multiple times) the notion that /> syntax should mean empty element.
It is now baked into the html parser spec in so many places that the /
is ignored and so <foo/> parses as <foo> (and thus as a start tag or as
a tag for a void element for elements defined as void in html) it would
be very hard to change that.

Personally I wish that they had made that the default behaviour even if
they had special cased some existing elements (the script element being
the most plausible argument where there are some possible attack points
if new parsers see zzz as not being inside script but old parsers not
understanding a xml-style /> syntax see zzz as script content given
<script/>zzz</script>)


The polyglot spec lists (most) of the things you need to do to make a 
document produce equivalent DOM whether parsed as html or xml, and most 
of those restrictions would apply equally to microxml.

http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html


To see how weird html parsing is, consider this microxml document


<math>
  <mfrac>
   <p>a</p>
   <p>b</p>
  </mfrac>
</math>

If parsed with a microxml or xml parser it produces a math element with 
a mfrac element child which has two p element children.

If parsed with an html parser it produces the DOM which you would get 
from this xml document (ignoring namespaces):



<html><head></head><body><math>
  <mfrac>
   </mfrac></math><p>a</p>
   <p>b</p>

</body></html>

Note the mfrac element now has no children and the two p elements are 
siblings of math, not grandchildren.

David



________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Thursday, 6 September 2012 12:19:17 UTC