Re: Exploring new vocabularies for HTML

On Mar 29, 2008, at 19:08, David Carlisle wrote:
>> I'm investigating possible options for addressing the problem of  
>> "Putting
>> an equation in a Web page". One of the options is doing something  
>> with
>> MathML.
> Given the existing implementation and experience in this area surely
> MathML should not simply be "one of the options" it should be the main
> option. For HTML5 to invent some new math markup unsupported by any
> existing mathematical software would be a complete disaster for the
> cause of putting scientific documents on the web.

I agree. Moreover, I think the HTML WG doesn't have the bandwidth to  
reinvent math notation *properly*.

As far as existing formats and options go, I'm concerned that  
implementing support for a constrained *TeX flavor in browsing  
software for non-visual access ( 
) would be a further diversion from getting MathML support in browsers  
(with accessibility). For visual and dynamic scenarios, a *TeX flavor  
wouldn't integrate with DOM scripting and CSS formatting the way  
presentational MathML does.

> (Microsoft Word converts to MathML on cut-and-paste,

Out of curiosity, where can one copy from? It has been over a year  
since I looked at Gecko's clipboard code, but I don't recall seeing an  
XML clipboard export code path for something like this.

> I think the assumption here was that in an html context one might  
> want to
> give up some of the rules coming from XML parsing (attribute quoting,
> perhaps some element closing, etc) I think it would be a mistake to  
> try
> to insert character level tokenisation and parsing to imply token
> elements such as mn and mi. The strength of a format like MathML
> is that such tokenisation is explict (and one of the problems in
> converting from say, TeX, where these things are not explicit is that
> different systems have different heuristics.

I agree that we shouldn't try to define magic start tag inference.  
It's not worth the trouble when it is reasonable to expect authors to  
generate MathML anyway most often anyway. (We should define popping  
rules when the end tag doesn't match the element on the stack, though.)

>> MathML is a very big language, with just shy of 190 unique elements  
>> in
>> MathML2 (HTML4, including all the deprecated elements, has but 91).  
>> Could
>> we get away with making that simpler for HTML, e.g. by not including
>> support for Content markup in the text/html variant?
> I think you should aim for the support level of mozilla.
> So basically just supporting presentation mathml (which brings the
> element count down to a handful of structural forms) but support
> <semantics> by rendering its first child and skipping over any
> annotation-xml children with display property of none. So annotation- 
> xml
> ought to be able to be take as content any well formed XML, but the  
> only
> requirement for html5 would be to parse to the end of it, not to  
> display
> content mathml natively. (Native rendering of content mathml3 would be
> nice but I think in the real world it's not going to happen  
> everywhere)

In Firefox 3, Gecko supports SVG in annotation-xml. I think annotation- 
xml should establish a new <body>-like parsing scope like <td> does.  
An <svg> element would establish an SVG scope right away, but other  
elements would be taken as HTML.

I'm not quite convinced about the utility of annotation-xml for  
purposes other than embedding vocabularies supported by Web engines  
(HTML and SVG). A while ago, I implemented XHTML5 + SVG 1.1 + MathML  
2.0 compound document support in with XHTML root and then  
arbitrary recursion through annotation-xml and foreignObject. For  
MathML in SVG in XHTML and SVG in MathML in XHTML, there are use cases  
on Jacques Distler's blog. However, I also created an "any OpenMath  
goes" hole, since the spec suggested OpenMath would be the third  
embeddable XML vocabulary in addition to XHTML and SVG.

I thought Jacques Distler's comment feed would have been one of the  
best ways to reach for MathML authors who are tracking now  
developments in this domain, so I posted there:

However, so far no one has come forward with OpenMath-in-annotation- 
xml validation needs. Is OpenMath actually used on the Web? What  
client is expected to consume it? So far it looks like it would make  
more sense to assume that browser-targeting authors use annotation-xml  
for SVG or XHTML, so we might as well open a new <body>-like HTML  
parsing scope in there (with <svg> in turn establishing an SVG scope  
straight away).

>> One of the use cases is the mixing of graphics and form controls into
>> equations. Is it possible to extend MathML to allow specific HTML5
>> phrasing-level elements (like <em>, <img>, <input>, also maybe the  
>> <svg>
>> element) wherever the <mglyph> element is currently allowed, or  
>> something
>> along those lines?

That would lead to much more special casing than establishing a nested  
scope in annotation-xml as I suggested above (and have suggested  

> It's possible technically of course but I think it's fair to say that
> there isn't total consensus on whether it's a good idea.
> there are though two aspects to that question.
> In a purely mathml context, should mathml be opened up to allow any
> foreign markup there.
> or if in "pure" mathml that is not allowed, should html+mathml allow  
> nested
> html (and docbook+mathml allow nested docbook, and as came up
> controversially recently should OOXML+MathML allow nested OOXML)

Do those cases allow the elements from the different vocabularies to  
intermingle without an annotation-xml scope?

Henri Sivonen

Received on Saturday, 29 March 2008 21:03:04 UTC