Re: Exploring new vocabularies for HTML

On Sat, 29 Mar 2008, David Carlisle wrote:
> > 
> > I'm investigating possible options for addressing the problem of 
> > "Putting an equation in a Web page". One of the options is doing 
> > something with MathML.
> 
> Given the existing implementation and experience in this area surely 
> MathML should not simply be "one of the options" it should be the main 
> option.

I don't understand the distinction.


> For HTML5 to invent some new math markup unsupported by any existing 
> mathematical software would be a complete disaster for the cause of 
> putting scientific documents on the web.

That has been proposed, as have other options such as LaTeX, eqn/neqn, the 
native formats of tools such as Mathematica, Maple, and OpenOffice.org 
Math, standards such as ISO 12083, and not doing anything at all. I am 
considering all options.


> > Could you point me to further information on this? I'm interested in 
> > investigating how much support there is for editing MathML content, in 
> > particular, since it's not a very human-friendly format.
> 
> Actually MathML is more "human friendly" than some think. (Here for 
> example we maintain a corpus that includes half a million or so mathml 
> fragments, mainly using emacs rather than a specific math editor). 
> Mathematics by its nature has always required a lot more markup than 
> plain text, and that is true whether it's TeX or MathML or OOMML, or 
> OpenMath.

Sure, but there are definitely orders of magnitude of difference between 
the verbosity of the different formats. (I hand-wrote all the MathML in my 
MathML+XHTML paper at University seven years ago, also in Emacs.)


> There are however several specific math editors that emit MathML (see 
> the WG's implementation page) but mathml support is also available in 
> more general software, in particular the leading computer algebra 
> packages mathematica and maple will both export as mathml, and both 
> Microsoft Word 2007 and OpenOffice have MathML support. (Microsoft Word 
> converts to MathML on cut-and-paste, OpenOffice stored maths in MathML 
> as the native format in ODF mathematics) All these systems have 
> graphical formula editors, and linearised input syntax for mathematics 
> that mean the author need not know mathml markup.

Interesting stuff. Do these packages also natively support MathML import? 
(To clarify, you are talking about native support, right? Not support 
after installing third-party plugins.)


> > Cool, that's very encouraging. Any knowledge you have about that would be 
> > great. Is there any documentation on common MathML errors? Is there any 
> > documentation on what elements could be implied? Is there any reason 
> > digits couldn't imply <mn>, for example, and punctuation couldn't imply 
> > <mo>? Any help here would be greatly appreciated.
> 
> I think the assumption here was that in an html context one might want 
> to give up some of the rules coming from XML parsing (attribute quoting, 
> perhaps some element closing, etc) I think it would be a mistake to try 
> to insert character level tokenisation and parsing to imply token 
> elements such as mn and mi. The strength of a format like MathML is that 
> such tokenisation is explict (and one of the problems in converting from 
> say, TeX, where these things are not explicit is that different systems 
> have different heuristics.

To clarify, any implication rules would be very explicit in the spec, and 
the result would be unambiguous. The question is just whether required 
tags could be omitted in the syntax.

For example, it seems like this:

   <math> 3 + n = 6 </math>

...could be unambiguously turned into:

   <math> <mrow><mn>3 </mn><mo>= </mo><mi>n </mi><mo>= </mo><mn>6 </mn></row></math>

What problems would this introduce?


> > MathML is a very big language, with just shy of 190 unique elements in 
> > MathML2 (HTML4, including all the deprecated elements, has but 91). Could 
> > we get away with making that simpler for HTML, e.g. by not including 
> > support for Content markup in the text/html variant?
> 
> I think you should aim for the support level of mozilla.
> So basically just supporting presentation mathml (which brings the
> element count down to a handful of structural forms) but support
> <semantics> by rendering its first child and skipping over any
> annotation-xml children with display property of none. So annotation-xml
> ought to be able to be take as content any well formed XML, but the only
> requirement for html5 would be to parse to the end of it, not to display
> content mathml natively. (Native rendering of content mathml3 would be
> nice but I think in the real world it's not going to happen everywhere)

<semantics> and <annotation-xml> are nice in theory, I agree, but are they 
really necessary? While I understand that math experts today might use 
them, it seems highly unlikely that the mass market would ever bother.

(There are problems that make the idea of any element taking "well-formed 
XML" in text/html basically unworkable.)


> One thing we could do to make this easier for you is, in mathml3, more 
> formally separate the grammars of presentation and content mathml so 
> they are usable separately.

That would be really useful.

Something else that would be useful is a summary of the MathML schema. I 
couldn't find anything human-readable in the MathML specs, and the DTD is 
not optimised for casual reading. Is there anything like that available?


> > One of the use cases is the mixing of graphics and form controls into 
> > equations. Is it possible to extend MathML to allow specific HTML5 
> > phrasing-level elements (like <em>, <img>, <input>, also maybe the 
> > <svg> element) wherever the <mglyph> element is currently allowed, or 
> > something along those lines?
> 
> It's possible technically of course but I think it's fair to say that 
> there isn't total consensus on whether it's a good idea. there are 
> though two aspects to that question.
> 
> In a purely mathml context, should mathml be opened up to allow any 
> foreign markup there.
> 
> or if in "pure" mathml that is not allowed, should html+mathml allow 
> nested html (and docbook+mathml allow nested docbook, and as came up 
> controversially recently should OOXML+MathML allow nested OOXML).

It would be interesting to hear how the MathML group would like the 
problems of graphics in equations and form controls in equations to be 
solved.


> The MathML2 spec said basically that if you nested other elements it 
> wasn't mathml, but that if you did it anyway a system might not generate 
> an error and might render it.

That kind of wishy-washy rule isn't going to fly for HTML5. :-)


> an architectural specification such as CDF should specify how different 
> formats can be mixed.

That seems unnecessary; HTML and MathML together should be defined in 
enough detail that no other spec is required to define how they work 
together, IMHO.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Saturday, 29 March 2008 20:21:45 UTC