- From: Bruce Miller <bruce.miller@nist.gov>
- Date: Mon, 31 Mar 2008 11:43:21 -0400
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: David Carlisle <davidc@nag.co.uk>, ian@hixie.ch, public-html@w3.org, www-math@w3.org
Henri Sivonen wrote: > > On Mar 31, 2008, at 11:28, David Carlisle wrote: >> The DOM models the internal memory structire of a browser, What passes >> between applications is typically the serial form. That's the essence >> of the definition of a markup language, that it defines a common >> language that can be shared between people or applications. > > We can ask browsers to use the XML serialization for clipboad export on > platforms that have pre-existing deployed XML-based clipboard flavor for > MathML. That will have to be a reserialization of the DOM anyway, so the > syntax from which the DOM was built no longer matters. > >> There is a big difference between say dropping quotes around attributes >> that can be automatically put back in for any tree (without any specific >> language knowledge, and parsing string of unmarked up text to infer >> some tree structure. > > The right way to do either is to run an HTML5 parser. Can someone please fill in some of the gaps, here? I get the feeling there's a stage(s) where "Magic Happens"... The proposal seems to be something like: an HTML5 page with MathML-ish stuff in it. The math in the _text_ of the page (1) emphatically does not have the MathML namespace, (2) may have omitted end tags, (3) doesn't have empty elements marked as <tag/>, (4) may have attribute values that aren't quoted, (5) may be limited to exclude <semantics> and named entities, (6) and may in the extreme case, even omit tags for token elements (<mo>,<mi>,<mn>). Did I miss anything? Now, that math is clearly not the serialization of Classic MathML, nor would it be allowable to put Classic MathML in the HTML5; Correct so far? OTOH, even in the more extreme case, there's no reason the DOM in the browser created by the HTML5 parser would be any different than the DOM that would have been created by an XML parser parsing Classic MathML. Correct? Would this actually be a _requirement_ in the HTML5 spec? Clearly, such a DOM could be serialized as either Classic MathML or HTML5-MathML. Now, it gets interesting: I'd like to cut that formula and use it in a computer algebra system, or graphing calculator, or.... I need Classic MathML and the browser could reconstruct it from the DOM.... Fine, but will that be a _requirement_ that a browser provide that? Or, is it anticipated that every MathML importing tool integrate an HTML5 parser? Or am I expected to paste to some tmp buffer, and run a 3rd party converter to convert to Classic form? Alternatively, suppose I'm writing an HTML5 web page and want to steal the math from another page. Will the browser also be required to offer me an HTML5 serialization of the math? Or, is it anticipated that all HTML or text editors would provide a tool or XSL to HTML5-serialize the XML? Or, again, am I expected to use a 3rd party tool? The above issues could be dealt with by putting requirements on browsers, but similar questions apply if I've obtained Classic MathML from some system and want to include it in an HTML5 page. Except that here I can't rely on the browser. The common theme here is that it is all too easy, though certainly true for many of the proposed "simplifications" of MathML, to say that there is an algorithm for converting between the serializations. However, unless there is a mandate to require these conversions to be available at some critical junctures, I very much fear that this will result in two effectively disconnected pools of math data. Requiring every MathML importer to include an HTML5 parser, and every MathML exporter to include an HTML5 serializer just seems like a quadratic version of the old joke: "Now you've got _two_ problems". -- bruce.miller@nist.gov http://math.nist.gov/~BMiller/
Received on Monday, 31 March 2008 15:44:30 UTC