Re: html elements in foreign markup. from Norman Walsh on 2010-12-22 (public-html-xml@w3.org from December 2010)

From: Norman Walsh <ndw@nwalsh.com>
Date: Wed, 22 Dec 2010 12:23:05 -0500
To: public-html-xml@w3.org
Message-ID: <m2pqst90xy.fsf@nwalsh.com>
David Carlisle <davidc@nag.co.uk> writes:
> An example for Norm, with a variable annotated with docbook:

:-)

> <math>
> <mfrac>
> <semantics>
> <mi>x</mi>
> <annotation-xml encoding="docbook">
> <para>some docbook with a <code>code</code> fragment</para>
> </annotation-xml>
> </semantics>
> <mi>y</mi>
> </mfrac>
> </math>
>
> that is valid (modulo namespaces which are not relevant here)
> according to any published schema for mathml.
> The html5 parse tree produced from that is:
>
> <math>
> <mfrac>
> <semantics>
> <mi>x</mi>
> <annotation-xml encoding="docbook">
> <para>some docbook with a
> </para></annotation-xml></semantics></mfrac></math><code>code</code>
> fragment
>
> <mi>y</mi>
>
> the <mfrac> now only has one child and the <mi>y</mi> is no longer
> inside the math.
>
> In order to support one or two sites that allegedly were using an
> undefined <math> element containing html but which still work if the
> math is closed and the html is moved outside the math, then every
> editing and document production system that is producing documents is
> supposed to somehow have special code to avoid this happening
> _forever_?
>
> The "legacy" argument for parsing the document like this is
> essentially non existent, it is a new html5 invention and no browsers
> did this until very recently (but chrome and firefox 4 do now)
>
> The workaround that was finally added as a result of
>
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887
>
> does not address this case, it just addresses (the more immediately
> pressing) case of html annotations, so now if you go
>
> <annotation-xml encoding="text/html">...the content is parsed as html
> and html tags are allowed.
>
> The problem is _not_ just related to MathML it affects all "foreign
> content" ie all uses of XML in HTML so pretty central to the concerns
> of this task force.

Yes. I must admit that when the consequences of this algorithm were
presented to me (independently a few weeks ago), I just assumed that
we were misunderstanding the specification.

It seems very odd to me that a well-formed nesting of tags should be
unwrapped in this way. I would have thought that anyone using this
markup would be expecting the resulting DOM to maintain the nesting so
that, for example, CSS could style it or JavaScript could process it.

Is the motivation for this decision just the existence of some legacy
content with a <math> element?

> It affects SVG in a slightly different way. SVG has a very similar
> foreignObject element that again is supposed to be able to take
> arbitrary well formed XML. However to avoid the above problem
> foreignObject is defined by the html5 parser to take html content
> (like annotation-xml[@encoding='text/html'] or mtext) so the case of
> html in svg is OK, but any other XML in that position will be
> mis-parsed if it uses empty element syntax.

That doesn't leave me with a warm fuzzy feeling.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
www.marklogic.com
Received on Wednesday, 22 December 2010 17:26:45 UTC