[Bug 9887] parsing algorithm should allow HTML content in MathML <annotation-xml> from bugzilla@jessica.w3.org on 2010-06-29 (public-html-bugzilla@w3.org from June 2010)

From: <bugzilla@jessica.w3.org>
Date: Tue, 29 Jun 2010 14:10:30 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1OTbWA-0001nY-Tv@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887





--- Comment #9 from David Carlisle <davidc@nag.co.uk>  2010-06-29 14:10:29 ---
(In reply to comment #8)


> However, to know what is better, real-world pages and use cases are
> needed.
> 

there are essentially no real world cases of html in math in a text/html
document as this (as you know) is all new.


> It's a bit complicated. text/html is not XML. It does not support arbitrary
> namespaces.

In text/html I don't really care what namespace the content gets put into, it's
clear that namespace support in text/html is never going to be as in xml.
all that needs happen is that any unknown stuff in annotation-xml is parsed and
left in the dom without affecting the rendering.

> It only supports HTML, SVG and MathML. For compatibility with
> existing content,

mathml in html has never been previously specified or implemented widely as far
as I can see. To specify that forever a completely counter-intuitive behaviour
is specified making annotation-xml essentially unusable for most of its
intended uses, in order that a few pages that I'm sure can be found that may
have had some strange existing markup continue to work in some browsers, seems
very strange.

>  certain tags have to break out of <math> unless they appear
> in places where HTML is expected (like <mtext>).
> 
> Since <annotation-xml> can contain MathML, the parser can't expect HTML there
> since it would break MathML in <annotation-xml> (the elements would be in the
> HTML namespace instead of the MathML namespace).
> 
> > annotation-xml is essentially like  data- attributes in html except that
> > being an element rather than an attribute it can take structured content.
> >
> > There are any number of reasons for wanting to annotate an expression
> > with (x)html, it may be a fallback html rendering for cut and paste into
> > systems that don't do mathml, it may be some kind of tooltip or
> > structured help which is perhaps activated by script elsewhere on the
> > page, it might be a copyright statement. It really isn't the job of the
> > specification to try to second guess why an expression is annotated,
> > just to allow it to be annotated.
> 
> Are you aware of any existing content that uses HTML in <annotation-xml>?

I doubt it since I am not aware of any existing html-capable systems that can
handle mathml.

> 
> If you think it should be supported to use HTML in <annotation-xml>, how would
> you want it to work? Should the parser expect HTML if there's an encoding
> attribute with a certain value? Should the parser special-case the "div" tag to
> indicate HTML content in annotation-xml? Something else?
> 
> >   >   - However, for the case of HTML content in<annotation-xml>, it
> >   >     does not provide for that content getting into the DOM as child
> >   >     content of the<annotation-xml>  element; instead such content
> >   >     will essentially end up getting into the DOM as a following
> >   >     sibling of any ancestor<math>  element.
> 
> This is only the case if the element is one of "b", "big", etc (see "in foreign
> content" in the spec). For other elements, they end up in the MathML namespace
> but don't break out.

OK as I say below, personally, while I find that unfortunate it's not that much
of a problem if you end up saying that html only works as expected in
annotation-xml if the annotation consists of a single div (which is presumably
enough to hide these elements?)


> It's for compat with pages that have bogus <math> followed by HTML content. Not
> rendering the HTML content would break the page.

If pages have entirely bogus markup that is eventually specified with a meaning
in som elater version of html then it is unlikely they worked reliably ever. No
doubt some pages existed with video or canvas with some spurious
interpretation.


> > I can think of no possible justification for rendering the child of an
> > annotation element that is deeply nested inside a math expression as
> > text following the math expression, so the question seems strangely
> > posed. Given that no user could possibly want this behaviour, what is
> > the compelling use case for specifying things that way?
> 
> See above. Users want pages to continue to work as they worked in the previous
> version of the browser, even if the page contains a <math> or <svg> tag
> somewhere where it's not intended to be MathML or SVG.

which users have pages that use mathml (in a form presumably that does not
render as math in any existing browser) and want to distort the use of mathml
forever, so that those pages carry on working unchanged? Do you have any
examples?


I realise that the html parser is under some constraints, and while I'm aware
of the general nature of the constraints I'm not that close to the
implementation so I'll list my (personal) wish list for annotation xml

1 (most important)

annotation-xml should allow any content that has all end-tags and empty tags
explicit, it just needs to correctly parse to the matching /annotation-xml

2) (important)
all annotations except the ones specifically highlighted below should have no
affect on rendering.

3 (desirable)

in text/html it should be able to parse things without the usual xml syntax
rules (so omitted end tags, etc)

4) (desirable)
for an annotation of Presentation-MathML, the system should render the
annotation rather than the base.

5) (for the HTML WG to decide)
possibly other annotations should affect rendering, eg html or svg.
(if you need some extra rules like html must be in a div, or whatever
 so be it)

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Tuesday, 29 June 2010 14:10:32 UTC