html elements in foreign markup.

In the minutes, Henri is quoted as:

>     Henri: The counter-intuitive behavior only arises if the document is an
>     error. If you try to do sensible stuff, you don't see this behavior.

Technically this is a true statement, as the definition of "error" is 
essentially "input which causes the parser to behave like this" and one 
might say that it is sensible to try to avoid this behaviour.

However there are many reasonable cases where the defined behaviour 
causes a document to break (and essentially no reasonable documents 
where it does anything useful).

For those not well versed in MathML, it has an annotation-xml element 
that allows arbitrary well formed XML as structured annotations around 
essentially any term. In a browser case, if the annotation isn't html 
then probably the _only_ thing it has to do is not mess it up and leave 
it in the DOM for a script of other process to use later.

An example for Norm, with a variable annotated with docbook:

<math>
<mfrac>
<semantics>
<mi>x</mi>
<annotation-xml encoding="docbook">
<para>some docbook with a <code>code</code> fragment</para>
</annotation-xml>
</semantics>
<mi>y</mi>
</mfrac>
</math>


that is valid (modulo namespaces which are not relevant here) according 
to any published schema for mathml.
The html5 parse tree produced from that is:


<math>
<mfrac>
<semantics>
<mi>x</mi>
<annotation-xml encoding="docbook">
<para>some docbook with a 
</para></annotation-xml></semantics></mfrac></math><code>code</code> 
fragment


<mi>y</mi>






the <mfrac> now only has one child and the <mi>y</mi> is no longer 
inside the math.


In order to support one or two sites that allegedly were using an 
undefined <math> element containing html but which still work if the 
math is closed and the html is moved outside the math, then every 
editing and document production system that is producing documents is 
supposed to somehow have special code to avoid this happening _forever_?

The "legacy" argument for parsing the document like this is essentially 
non existent, it is a new html5 invention and no browsers did this until 
very recently (but chrome and firefox 4 do now)

The workaround that was finally added as a result of


http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887

does not address this case, it just addresses (the more immediately 
pressing) case of html annotations, so now if you go

<annotation-xml encoding="text/html">...the content is parsed as html 
and html tags are allowed.

The problem is _not_ just related to MathML it affects all "foreign 
content" ie all uses of XML in HTML so pretty central to the concerns of 
this task force.

It affects SVG in a slightly different way. SVG has a very similar 
foreignObject element that again is supposed to be able to take 
arbitrary well formed XML. However to avoid the above problem 
foreignObject is defined by the html5 parser to take html content (like 
annotation-xml[@encoding='text/html'] or mtext) so the case of html in 
svg is OK, but any other XML in that position will be mis-parsed if it 
uses empty element syntax.

David





________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Wednesday, 22 December 2010 11:08:10 UTC