Re: Exploring new vocabularies for HTML from Neil Soiffer on 2008-03-30 (www-math@w3.org from March 2008)

From: Neil Soiffer <Neils@dessci.com>
Date: Sun, 30 Mar 2008 16:19:52 -0700
To: "David Carlisle" <davidc@nag.co.uk>
Cc: ian@hixie.ch, public-html@w3.org, www-math@w3.org
Message-ID: <d98bce170803301619v67d4dbadpd46fa2a10f52a8af@mail.gmail.com>
To me, it seems like there are two issues that should be disentangled about
what MathML in HTML5 looks like:  the "linearization" and what ends up in
the DOM.

As a content producer, whether a programmer producing it or as a hand
author, what I care about is the linearization.  I don't care about the DOM
-- in fact, the MathML may never be used in a web page.  For this, I need a
document that tells me what I should write.

As someone who renders MathML in a browser or as a programmer who wants to
query or manipulate the MathML in a browser, I care about the DOM.  I don't
care about the linearization -- in fact, the math may have been created by
programmatically manipulating the DOM and never had a linearization.

I know that I speak for the MathML WG when I state that interoperability of
MathML is extremely important.  Because MathML is used in many contexts
outside of browsers, the linearization and its interoperability is crucial.
This is why I and the MathML WG feel that having a single published syntax
for MathML is important.

In XHTML and many other contexts (eg, a publisher who gets a document with
MathML in it), adherence to a strict DTD or schema is an important step in
validating the input.  In these contexts, "repairing" the linearization to
produce a DOM is inappropriate.  This is where HTML 5 differs from the other
contexts in which MathML must live. I think it is also where we seemingly
are at odds, but perhaps we are just confusing the syntax and repair.

HTML5 needs to specify what happens when presented with an arbitrary
sequence of characters that follows "<math", or maybe some even more
complicated set of conditions (eg, "xxx </math>" or even <mfrac> a b
</mfrac>").  Leaving aside when repair kicks in, I think it is important to
distinguish between what authors are told is legal and what repair mechanism
is used.  I think this goes to the heart of the disagreements about syntax
on this list.  If everyone agreed that MathML, as presented in the W3C spec
is the math syntax to use and tell people to use, then the question becomes
how does one fix up illegal syntax.  The semantic difference that I think is
important is that applications that generate MathML and people who directly
write math use MathML as per the W3C spec; browser implementors read some
specification as to how to map illegal MathML into some legal MathML DOM.

>From this point of view, I don't think you'll see a strenuous objection from
David or myself as to what is legal for repair.  The question then turns
into what are the set of priorities for repair.

David suggested one extreme, which is to wrap the math with an
"merror/mtext" and alert the user to the problem so they can determine a
proper fix.  MathPlayer in IE does something like this and I have found it
extremely useful when I have hand authored/tweaked something.

I think Ian and a few others feel that an implementation should go to much
greater lengths to fix up at least some "errors" and should do lexical
analysis and parsing so that  "<math> 2x+y=3 </math>" turns into MathML.
I'm not sure how far they would go, since producing "proper" MathML would
mean such a parser would need to infer the invisible multiplication operator
between the 2 and x and would put mrows around the "2x" and "2x+1" -- that
requires building in knowledge of the precedence and associativity of all of
the Unicode characters.

Is my proposal of dividing the issue into syntax and repair acceptable to
Ian and others?  If so, do you (all) agree that MathML's linearization as
specified in the MathML spec the syntax to use, or is there some other
proposal for what is the "base" syntax for math?

Neil Soiffer
Senior Scientist
Design Science, Inc.
www.dessci.com
~ Makers of Equation Editor, MathType, MathPlayer and MathFlow ~



On Sun, Mar 30, 2008 at 3:55 AM, David Carlisle <davidc@nag.co.uk> wrote:

>
>
> > Correct me if I'm wrong, but this:
> >
> >    <math> 3 </math>
> >
> > ...is invalid MathML markup. <math> elements can't contain numbers
> > directly. (Incidentally, I determined this from the DTD, but I can't
> find
> > anything in MathML2 that defines the processing for this error. What
> > should that render as, assuming the correct namespaces?)
>
> The DTD is normative, so one prossibility is that a validating xml
> parser is used, in which case the above never gets as far as a renderer.
> If a system uses a non validating parser, then it is up to that system
> to report the error in whatever way is natural. (Mozilla for example
> deals with that, and other errors in a way it finds natural which is to
> say it silently just lets the text fall through, but without any
> typographic fix up.
>
>
> Unlike the html case where you can try to specify full application
> behaviour even in error situations, mathml is intended primarily to be
> hosted by some other language (most mathematical expressions live in
> some wider context) and the application behaviour of xyz+mathml has to
> be mainly influenced by the application behaviour of the host language
> xyz.
>
> So basically the current situation is that the above isn't MathML so if
> you give it to a MahML (only) system it will generate an error, but if
> you give it to a system that defines some language (such as html+mathml)
> that isn't defined by the mathml spec, it may do something else such as
> silently ignore the error.
>
> In an HTML5 context you are not going to want (the equivalent of) a
> validity error on parsing which kills the entire document, that is
> clear. But the fixup should only be, that an implied merror (or mtext,
> perhaps) is inserted
>
> <math>1+2</math>
>
> couuld perhaps parse as (preferably)
>
> <math><merror><mtext>1+2</mtext></merror></math>
>
> rendering typically as 1+2 in a red border
>
> or perhaps we could consider whether it should parse as
>
> <math><mtext>1+2</mtext></math>
>
> redering as 1+2 with no mathematical spacing refinements.
>
> But html5 should definitely not try to turn math into some kind of
> private html microformat that implies character-by-character
> tokenization and parsing of the character data resulting in
>
> <math><mn>1</mn><mo>+</mo><mn>2</mn></math>
>
> As Neil said, if you go that route why not add wiki syntax to html so
> that authors don't need to use <h*> markup for headings but can just use
> some ascii punctuation syntax?  There is actually a need for a linear
> syntax for mathematics usable in wikis and the like, but it should be
> considered in that context not this one.
>
> David
>
>
>
> ________________________________________________________________________
> The Numerical Algorithms Group Ltd is a company registered in England
> and Wales with company number 1249803. The registered office is:
> Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
>
> This e-mail has been scanned for all viruses by Star. The service is
> powered by MessageLabs.
> ________________________________________________________________________
>
>
Received on Sunday, 30 March 2008 23:20:32 UTC