Re: HTML/MathML integration from Neil Soiffer on 2008-04-01 (public-html@w3.org from April 2008)

From: Neil Soiffer <Neils@dessci.com>
Date: Tue, 1 Apr 2008 11:22:13 -0700
To: "Jim Jewett" <jimjjewett@gmail.com>
Cc: "David Carlisle" <davidc@nag.co.uk>, public-html@w3.org, www-math@w3.org
Message-ID: <d98bce170804011122t1d100c20t45cfcd72912f3b76@mail.gmail.com>
I think that there are two meanings of "syntax" being used, and that is
causing confusing.  In earlier email, I tried (unsuccessfully) to clear this
up by saying that there is "user syntax" and "parser/implementor syntax".
The former is what you tell users they should type.  The latter is what the
browsers actually implement.  The later is complicated and messy and deals
with also sorts of "weird" cases like

<h1> A heading <xxx> that has an ignored tag </xxx>  <p> Paragraphs end
headings

I'm guessing at the above (unknown elements display their contents; <p> ends
<h1>) -- hopefully this won't be confusing if I'm wrong.

In user documentation, you would never say:

To create a level 1 heading, use the <h1> tag.  Feel free to throw in any
> unknown html element in the heading, it will be ignored.  You only need to
> close the <h1> tag when ...  Etc, etc.
>

Instead you would say

Use <h1> to create a level 1 heading.  Use </h1> to end the heading.



The fact that you can write all sorts of different things and get a h1 to
display/parse as desired is just a way to confuse the user.

Getting back to MathML, producing complicated rules to say, this is what
happens when you omit an end tag in this situation or leave off tagging in
this situation is a really bad way to present MathML to users.  Obviously,
in the HTML model, the parser implementors need to be told what should
happen in those cases, but is it helpful to anyone but an HTML expert?

Ian has already assured people that "classic MathML" will work in HTML5.  He
said using a namespace prefix won't (Ian can you elaborate why you just
don't ignore the "xxx" <xxx:mi>?), but otherwise, the MathML spec, user
tutorials, and software can remain unchanged.

The open question is what should happen when the MathML has an error in the
XML parser sense.  David and I both lean to "don't try to guess, because
you'll likely get it wrong and I won't know" school.  We think the "repair"
should be to wrap the offending part in an mtext (if it was missing a tag),
do what it takes to build a valid MathML DOM, and wrap the whole thing in
<merror> so that the user is informed something went wrong.

Others obviously think HTML5 can do a good job guessing intentions and an
merror is not appropriate.  Unless the simplifications get to the point
where it is clearly some user friendly syntax that is easily described, they
really should not be part of the description of the language for the user.
And the complicated rules for dealing with really messed up MathML add
nothing but confusion to users. They do need to be described for parser
writers, but like a DTD, they really shouldn't be meant for general
consumption.

Neil Soiffer
Senior Scientist
Design Science, Inc.
www.dessci.com
~ Makers of Equation Editor, MathType, MathPlayer and MathFlow ~



On Tue, Apr 1, 2008 at 10:10 AM, Jim Jewett <jimjjewett@gmail.com> wrote:

>
> On 4/1/08, David Carlisle <davidc@nag.co.uk> wrote:
>
> >  > These people won't suddenly fix their whole toolchain to support
> math.
> >  >  The conscientious will just avoid MathML because of the cost of
> >  > making the rest of the page valid.
>
> > In what way will the use of MathML impact on the requirements of the
> >  rest of the page?
>
> Imagine a minimal integration ruleset.
>
> "<mathml " ==> switch to XML parser until you get to "</mathml>"
>
> This also implies that
>    mathml can't embed other mathml, not even indirectly through SVG.
>    you can't use any html within the mathml, unless it is xhtml.
>
> So what happens when someone *does* write
>
> <mathml><div>...
>
> or mis-spells the </mathml>
>
> or ...
>
>
> >  >   Most people will go ahead and use
> >  > it in a simplified, invalid manner, which the various browsers will
> >  > correct in different adhoc ways.  That is the worst of both worlds.
>
> > Sorry, I don't understand this point at all. There may be some
> >  discussion about how far the syntax of "mathml-in-html" may differ from
> >  that of "mathml-in-xml" but surely it's a basic premise of the html5
> >  design that whatever syntax is finally agreed on, it will have a
> >  specified mapping to a DOM, and won't require ad hoc browser fixes?
>
> Not when it is written correctly.
>
> But it won't be written correctly.  That is part of the culture of
> html.  And browsers will try to fix it up anyhow.  That is part of the
> culture of html browsers.
>
> So the real questions are:
>
> How can we reduce the number of errors?
> How can we make the errors predictable enough that the fixups are
> consistent?
>
> To use concrete examples from Jacques Distler via Sam Ruby,
>
> When people *do* write the "invalid":
>    <math>100,000</math>
>
> should it mean:
>    <mhtml><mn>146,382</mn></mhtml>
> or:
>    <mhtml><mn>146</mn><mo>,</mo><mn>382</mn></mhtml>
> or something else?
>
> The decision needs to be specified.  And in html, the short "invalid"
> form should probably be acceptable, representing the simpler case.
> (You can always be explicit to get the other case.)  If mathml needs
> to keep it invalid (to ensure quality), then maybe the two languages
> aren't ready for tight integration.
>
> Note that this won't stop MathML from integrating with SVG or docbooks
> -- it is HTML that is the odd man out.
>
> -jJ
>
>
Received on Tuesday, 1 April 2008 18:22:59 UTC