Re: Exploring new vocabularies for HTML from Bruce Miller on 2008-03-31 (www-math@w3.org from March 2008)

From: Bruce Miller <bruce.miller@nist.gov>
Date: Mon, 31 Mar 2008 11:43:21 -0400
To: Henri Sivonen <hsivonen@iki.fi>
Cc: David Carlisle <davidc@nag.co.uk>, ian@hixie.ch, public-html@w3.org, www-math@w3.org
Message-id: <47F10699.40308@nist.gov>
Henri Sivonen wrote:
> 
> On Mar 31, 2008, at 11:28, David Carlisle wrote:
>> The DOM  models the internal memory structire of a browser, What passes
>> between applications is typically the serial form. That's the essence
>> of the definition of a markup language, that it defines a common
>> language that can be shared between people or applications.
> 
> We can ask browsers to use the XML serialization for clipboad export on 
> platforms that have pre-existing deployed XML-based clipboard flavor for 
> MathML. That will have to be a reserialization of the DOM anyway, so the 
> syntax from which the DOM was built no longer matters.
> 
>> There is a big difference between say dropping quotes around attributes
>> that can be automatically put back in for any tree (without any specific
>> language knowledge, and parsing  string of unmarked up text to infer
>> some tree structure.
> 
> The right way to do either is to run an HTML5 parser.

Can someone please fill in some of the gaps, here?
I get the feeling there's a stage(s) where "Magic Happens"...

The proposal seems to be something like:
an HTML5 page with MathML-ish stuff in it.
The math in the _text_ of the page (1) emphatically
does not have the MathML namespace, (2) may have omitted
end tags, (3) doesn't have empty elements marked as <tag/>,
(4) may have attribute values that aren't quoted,
(5) may be limited to exclude <semantics> and named entities,
(6) and may in the extreme case, even omit tags for token
elements (<mo>,<mi>,<mn>).
Did I miss anything?

Now, that math is clearly not the serialization of 
Classic MathML, nor would it be allowable to put
Classic MathML in the HTML5;
Correct so far?

OTOH, even in the more extreme case, there's no
reason the DOM in the browser created by the HTML5
parser would be any different than the DOM that
would have been created by an XML parser parsing
Classic MathML.
Correct?
Would this actually be a _requirement_ in the HTML5 spec?

Clearly, such a DOM could be serialized as
either Classic MathML or HTML5-MathML.

Now, it gets interesting:
I'd like to cut that formula and use it
in a computer algebra system, or graphing calculator,
or....  I need Classic MathML and the browser could
reconstruct it from the DOM....
Fine, but will that be a _requirement_ that a browser
provide that?
Or, is it anticipated that every MathML importing
tool integrate an HTML5 parser?
Or am I expected to paste to some tmp buffer, and
run a 3rd party converter to convert to Classic form?

Alternatively, suppose I'm writing an HTML5 web page
and want to steal the math from another page.
Will the browser also be required to offer me an
HTML5 serialization of the math?
Or, is it anticipated that all HTML or text editors
would provide a tool or XSL to HTML5-serialize the XML?
Or, again, am I expected to use a 3rd party tool?

The above issues could be dealt with by putting
requirements on browsers, but similar questions
apply if I've obtained Classic MathML from some
system and want to include it in an HTML5 page.
Except that here I can't rely on the browser.

The common theme here is that it is all too easy,
though certainly true for many of the proposed
"simplifications" of MathML, to say that there is
an algorithm for converting between the serializations.
However, unless there is a mandate to require
these conversions to be available at some critical
junctures, I very much fear that this will result
in two effectively disconnected pools of math data.

Requiring every MathML importer to include an
HTML5 parser, and every MathML exporter to
include an HTML5 serializer just seems like
a quadratic version of the old joke:
  "Now you've got _two_ problems".

-- 
bruce.miller@nist.gov
http://math.nist.gov/~BMiller/
Received on Monday, 31 March 2008 15:44:30 UTC