- From: Neil Soiffer <Neils@dessci.com>
- Date: Sun, 30 Mar 2008 16:19:52 -0700
- To: "David Carlisle" <davidc@nag.co.uk>
- Cc: ian@hixie.ch, public-html@w3.org, www-math@w3.org
- Message-ID: <d98bce170803301619v67d4dbadpd46fa2a10f52a8af@mail.gmail.com>
To me, it seems like there are two issues that should be disentangled about what MathML in HTML5 looks like: the "linearization" and what ends up in the DOM. As a content producer, whether a programmer producing it or as a hand author, what I care about is the linearization. I don't care about the DOM -- in fact, the MathML may never be used in a web page. For this, I need a document that tells me what I should write. As someone who renders MathML in a browser or as a programmer who wants to query or manipulate the MathML in a browser, I care about the DOM. I don't care about the linearization -- in fact, the math may have been created by programmatically manipulating the DOM and never had a linearization. I know that I speak for the MathML WG when I state that interoperability of MathML is extremely important. Because MathML is used in many contexts outside of browsers, the linearization and its interoperability is crucial. This is why I and the MathML WG feel that having a single published syntax for MathML is important. In XHTML and many other contexts (eg, a publisher who gets a document with MathML in it), adherence to a strict DTD or schema is an important step in validating the input. In these contexts, "repairing" the linearization to produce a DOM is inappropriate. This is where HTML 5 differs from the other contexts in which MathML must live. I think it is also where we seemingly are at odds, but perhaps we are just confusing the syntax and repair. HTML5 needs to specify what happens when presented with an arbitrary sequence of characters that follows "<math", or maybe some even more complicated set of conditions (eg, "xxx </math>" or even <mfrac> a b </mfrac>"). Leaving aside when repair kicks in, I think it is important to distinguish between what authors are told is legal and what repair mechanism is used. I think this goes to the heart of the disagreements about syntax on this list. If everyone agreed that MathML, as presented in the W3C spec is the math syntax to use and tell people to use, then the question becomes how does one fix up illegal syntax. The semantic difference that I think is important is that applications that generate MathML and people who directly write math use MathML as per the W3C spec; browser implementors read some specification as to how to map illegal MathML into some legal MathML DOM. >From this point of view, I don't think you'll see a strenuous objection from David or myself as to what is legal for repair. The question then turns into what are the set of priorities for repair. David suggested one extreme, which is to wrap the math with an "merror/mtext" and alert the user to the problem so they can determine a proper fix. MathPlayer in IE does something like this and I have found it extremely useful when I have hand authored/tweaked something. I think Ian and a few others feel that an implementation should go to much greater lengths to fix up at least some "errors" and should do lexical analysis and parsing so that "<math> 2x+y=3 </math>" turns into MathML. I'm not sure how far they would go, since producing "proper" MathML would mean such a parser would need to infer the invisible multiplication operator between the 2 and x and would put mrows around the "2x" and "2x+1" -- that requires building in knowledge of the precedence and associativity of all of the Unicode characters. Is my proposal of dividing the issue into syntax and repair acceptable to Ian and others? If so, do you (all) agree that MathML's linearization as specified in the MathML spec the syntax to use, or is there some other proposal for what is the "base" syntax for math? Neil Soiffer Senior Scientist Design Science, Inc. www.dessci.com ~ Makers of Equation Editor, MathType, MathPlayer and MathFlow ~ On Sun, Mar 30, 2008 at 3:55 AM, David Carlisle <davidc@nag.co.uk> wrote: > > > > Correct me if I'm wrong, but this: > > > > <math> 3 </math> > > > > ...is invalid MathML markup. <math> elements can't contain numbers > > directly. (Incidentally, I determined this from the DTD, but I can't > find > > anything in MathML2 that defines the processing for this error. What > > should that render as, assuming the correct namespaces?) > > The DTD is normative, so one prossibility is that a validating xml > parser is used, in which case the above never gets as far as a renderer. > If a system uses a non validating parser, then it is up to that system > to report the error in whatever way is natural. (Mozilla for example > deals with that, and other errors in a way it finds natural which is to > say it silently just lets the text fall through, but without any > typographic fix up. > > > Unlike the html case where you can try to specify full application > behaviour even in error situations, mathml is intended primarily to be > hosted by some other language (most mathematical expressions live in > some wider context) and the application behaviour of xyz+mathml has to > be mainly influenced by the application behaviour of the host language > xyz. > > So basically the current situation is that the above isn't MathML so if > you give it to a MahML (only) system it will generate an error, but if > you give it to a system that defines some language (such as html+mathml) > that isn't defined by the mathml spec, it may do something else such as > silently ignore the error. > > In an HTML5 context you are not going to want (the equivalent of) a > validity error on parsing which kills the entire document, that is > clear. But the fixup should only be, that an implied merror (or mtext, > perhaps) is inserted > > <math>1+2</math> > > couuld perhaps parse as (preferably) > > <math><merror><mtext>1+2</mtext></merror></math> > > rendering typically as 1+2 in a red border > > or perhaps we could consider whether it should parse as > > <math><mtext>1+2</mtext></math> > > redering as 1+2 with no mathematical spacing refinements. > > But html5 should definitely not try to turn math into some kind of > private html microformat that implies character-by-character > tokenization and parsing of the character data resulting in > > <math><mn>1</mn><mo>+</mo><mn>2</mn></math> > > As Neil said, if you go that route why not add wiki syntax to html so > that authors don't need to use <h*> markup for headings but can just use > some ascii punctuation syntax? There is actually a need for a linear > syntax for mathematics usable in wikis and the like, but it should be > considered in that context not this one. > > David > > > > ________________________________________________________________________ > The Numerical Algorithms Group Ltd is a company registered in England > and Wales with company number 1249803. The registered office is: > Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. > > This e-mail has been scanned for all viruses by Star. The service is > powered by MessageLabs. > ________________________________________________________________________ > >
Received on Sunday, 30 March 2008 23:20:29 UTC