Re: Exploring new vocabularies for HTML

Ian,

(personal response)

> I'm investigating possible options for addressing the problem of "Putting 
> an equation in a Web page". One of the options is doing something with 
> MathML.

Given the existing implementation and experience in this area surely
MathML should not simply be "one of the options" it should be the main
option. For HTML5 to invent some new math markup unsupported by any
existing mathematical software would be a complete disaster for the
cause of putting scientific documents on the web.

> Could you point me to further information on this? I'm interested in 
> investigating how much support there is for editing MathML content, in 
> particular, since it's not a very human-friendly format.

Actually MathML is more "human friendly" than some think. (Here for
example we maintain a corpus that includes half a million or so mathml
fragments, mainly using emacs rather than a specific math
editor). Mathematics by its nature has always required a lot more markup
than plain text, and that is true whether it's TeX or MathML or OOMML,
or OpenMath.

There are however several specific math editors that emit MathML (see
the WG's implementation page) but mathml support is also
available in more general software, in particular the leading computer
algebra packages mathematica and maple will both export as mathml, and
both Microsoft Word 2007 and OpenOffice have MathML support. (Microsoft
Word converts to MathML on cut-and-paste, OpenOffice stored maths in
MathML as the native format in ODF mathematics) All these systems have
graphical formula editors, and linearised input syntax for mathematics
that mean the author need not know mathml markup.

 > Cool, that's very encouraging. Any knowledge you have about that would be 
> great. Is there any documentation on common MathML errors? Is there any 
> documentation on what elements could be implied? Is there any reason 
> digits couldn't imply <mn>, for example, and punctuation couldn't imply 
> <mo>? Any help here would be greatly appreciated.

I think the assumption here was that in an html context one might want to
give up some of the rules coming from XML parsing (attribute quoting,
perhaps some element closing, etc) I think it would be a mistake to try
to insert character level tokenisation and parsing to imply token
elements such as mn and mi. The strength of a format like MathML 
is that such tokenisation is explict (and one of the problems in
converting from say, TeX, where these things are not explicit is that
different systems have different heuristics. 

> MathML is a very big language, with just shy of 190 unique elements in 
> MathML2 (HTML4, including all the deprecated elements, has but 91). Could 
> we get away with making that simpler for HTML, e.g. by not including 
> support for Content markup in the text/html variant?

I think you should aim for the support level of mozilla.
So basically just supporting presentation mathml (which brings the
element count down to a handful of structural forms) but support
<semantics> by rendering its first child and skipping over any
annotation-xml children with display property of none. So annotation-xml
ought to be able to be take as content any well formed XML, but the only
requirement for html5 would be to parse to the end of it, not to display
content mathml natively. (Native rendering of content mathml3 would be
nice but I think in the real world it's not going to happen everywhere)

One thing we could do to make this easier for you is, in mathml3, more
formally separate the grammars of presentation and content mathml so
they are usable separately.


> One of the use cases is the mixing of graphics and form controls into 
> equations. Is it possible to extend MathML to allow specific HTML5 
> phrasing-level elements (like <em>, <img>, <input>, also maybe the <svg> 
> element) wherever the <mglyph> element is currently allowed, or something 
> along those lines?

It's possible technically of course but I think it's fair to say that
there isn't total consensus on whether it's a good idea.
there are though two aspects to that question.

In a purely mathml context, should mathml be opened up to allow any
foreign markup there. 

or if in "pure" mathml that is not allowed, should html+mathml allow nested
html (and docbook+mathml allow nested docbook, and as came up
controversially recently should OOXML+MathML allow nested OOXML)
The MathML2 spec said basically that if you nested other elements it
wasn't mathml, but that if you did it anyway a system might not generate
an error and might render it. This more or less allowed the
mozill/firefox behaviour of rendering nested html in mathml, while
allowing other pure mathml systems to reject "mathml" that contains
nested html. This is an interesting area, and certainly something that we
can talk about, exactly what the specs should say, or whether the
individual specs should say nothing, but that an architectural
specification such as CDF should specify how different formats can be
mixed.



David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Saturday, 29 March 2008 17:09:19 UTC