[Bug 9887] parsing algorithm should allow HTML content in MathML <annotation-xml>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887





--- Comment #8 from Simon Pieters <simonp@opera.com>  2010-06-29 13:12:10 ---
(In reply to comment #7)
> please see
> 
> http://lists.w3.org/Archives/Public/www-math/2010Jun/0022.html


> On 29/06/2010 08:13, Michael(tm) Smith wrote:
>   > If you have use cases and/or real-world examples, in existing
>   > documents, of<annotation-xml>  instances containing HTML content,
>   > please post them as replies to this message and/or as comments
>   > to the following HTML WG bugzilla bug -
>   >
>   >    http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887
>   >
>   > The background on my request is this:
>   >
>   >   - The HTML5 specification defines an algorithm for parsing
>   >     text/html (non-XML) documents that contain MathML elements.
>   >
>   >   - That algorithm deals with the<annotation-xml>  element as a
>   >     special case; it provides for both SVG and MathML content in
>   >     <annotation-xml>  being properly parsed into a DOM as expected.
>
> If this is the best that can be achieved in the html5 parsing algorithm
> it is I suppose better than nothing but it is really a very broken
> design.

The purpose of this bug is finding if the algorithm can be changed to something
better. However, to know what is better, real-world pages and use cases are
needed.


> annotation-xml should take any well formed XML. The XML syntax
> (with explicitly closing /> empty element syntax was designed to make
> this easy to achieve; it should always be possible to reliably parse to
> the correctly matching close /annotation-xml. In an HTML5 context the
> syntax rules no doubt would be relaxed a bit, but it should still be
> possible to parse to the end of the annotation reliably, and to place
> those elements into the dom (with by default no effect on rendering).

It's a bit complicated. text/html is not XML. It does not support arbitrary
namespaces. It only supports HTML, SVG and MathML. For compatibility with
existing content, certain tags have to break out of <math> unless they appear
in places where HTML is expected (like <mtext>).

Since <annotation-xml> can contain MathML, the parser can't expect HTML there
since it would break MathML in <annotation-xml> (the elements would be in the
HTML namespace instead of the MathML namespace).

> annotation-xml is essentially like  data- attributes in html except that
> being an element rather than an attribute it can take structured content.
>
> There are any number of reasons for wanting to annotate an expression
> with (x)html, it may be a fallback html rendering for cut and paste into
> systems that don't do mathml, it may be some kind of tooltip or
> structured help which is perhaps activated by script elsewhere on the
> page, it might be a copyright statement. It really isn't the job of the
> specification to try to second guess why an expression is annotated,
> just to allow it to be annotated.

Are you aware of any existing content that uses HTML in <annotation-xml>?

If you think it should be supported to use HTML in <annotation-xml>, how would
you want it to work? Should the parser expect HTML if there's an encoding
attribute with a certain value? Should the parser special-case the "div" tag to
indicate HTML content in annotation-xml? Something else?

>   >   - However, for the case of HTML content in<annotation-xml>, it
>   >     does not provide for that content getting into the DOM as child
>   >     content of the<annotation-xml>  element; instead such content
>   >     will essentially end up getting into the DOM as a following
>   >     sibling of any ancestor<math>  element.

This is only the case if the element is one of "b", "big", etc (see "in foreign
content" in the spec). For other elements, they end up in the MathML namespace
but don't break out.

> That would be entirely incorrect.

It's for compat with pages that have bogus <math> followed by HTML content. Not
rendering the HTML content would break the page.


>   > You can test and see for yourself by using a recent Mozilla
>   > Minefield or Firefox nightly build with this page:
>   >
>   >    http://software.hixie.ch/utilities/js/live-dom-viewer/
>   >
>   > for example:
>   >
>   >
> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%3Ctitle%3E%3C%2Ftitle%3E%0A%3Cp%3E%0A%3Cmath%3E%0A%3Csemantics%3E%0A%3Cmi%3Efoo%3C%2Fmi%3E%0A%3Cannotation-xml%3E%0A%3Cimg%20src%3Dbar%3E%0A%3C%2Fannotation-xml%3E%0A%3C%2Fmath%3E%0A%3C%2Fp%3E
>   >
>   >    or: http://bit.ly/dy4Rxj
>   >
>   > So what I would like to try to get clarification on is whether
>   > there are compelling use cases for having HTML content within the
>   > <annotation-xml>  element that would justify making a change at
>   > this point to the parsing algorithm in the HTML5 spec (and to the
>   > behavior of existing implementations of that).
>
> I can think of no possible justification for rendering the child of an
> annotation element that is deeply nested inside a math expression as
> text following the math expression, so the question seems strangely
> posed. Given that no user could possibly want this behaviour, what is
> the compelling use case for specifying things that way?

See above. Users want pages to continue to work as they worked in the previous
version of the browser, even if the page contains a <math> or <svg> tag
somewhere where it's not intended to be MathML or SVG.

>   >    --Mike
>   >
>
> David
> (speaking personally)

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 29 June 2010 13:12:12 UTC