- From: David Carlisle <davidc@nag.co.uk>
- Date: Wed, 22 Dec 2010 11:07:38 +0000
- To: public-html-xml@w3.org
In the minutes, Henri is quoted as: > Henri: The counter-intuitive behavior only arises if the document is an > error. If you try to do sensible stuff, you don't see this behavior. Technically this is a true statement, as the definition of "error" is essentially "input which causes the parser to behave like this" and one might say that it is sensible to try to avoid this behaviour. However there are many reasonable cases where the defined behaviour causes a document to break (and essentially no reasonable documents where it does anything useful). For those not well versed in MathML, it has an annotation-xml element that allows arbitrary well formed XML as structured annotations around essentially any term. In a browser case, if the annotation isn't html then probably the _only_ thing it has to do is not mess it up and leave it in the DOM for a script of other process to use later. An example for Norm, with a variable annotated with docbook: <math> <mfrac> <semantics> <mi>x</mi> <annotation-xml encoding="docbook"> <para>some docbook with a <code>code</code> fragment</para> </annotation-xml> </semantics> <mi>y</mi> </mfrac> </math> that is valid (modulo namespaces which are not relevant here) according to any published schema for mathml. The html5 parse tree produced from that is: <math> <mfrac> <semantics> <mi>x</mi> <annotation-xml encoding="docbook"> <para>some docbook with a </para></annotation-xml></semantics></mfrac></math><code>code</code> fragment <mi>y</mi> the <mfrac> now only has one child and the <mi>y</mi> is no longer inside the math. In order to support one or two sites that allegedly were using an undefined <math> element containing html but which still work if the math is closed and the html is moved outside the math, then every editing and document production system that is producing documents is supposed to somehow have special code to avoid this happening _forever_? The "legacy" argument for parsing the document like this is essentially non existent, it is a new html5 invention and no browsers did this until very recently (but chrome and firefox 4 do now) The workaround that was finally added as a result of http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887 does not address this case, it just addresses (the more immediately pressing) case of html annotations, so now if you go <annotation-xml encoding="text/html">...the content is parsed as html and html tags are allowed. The problem is _not_ just related to MathML it affects all "foreign content" ie all uses of XML in HTML so pretty central to the concerns of this task force. It affects SVG in a slightly different way. SVG has a very similar foreignObject element that again is supposed to be able to take arbitrary well formed XML. However to avoid the above problem foreignObject is defined by the html5 parser to take html content (like annotation-xml[@encoding='text/html'] or mtext) so the case of html in svg is OK, but any other XML in that position will be mis-parsed if it uses empty element syntax. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
Received on Wednesday, 22 December 2010 11:08:10 UTC