- From: James Graham <jg307@cam.ac.uk>
- Date: Mon, 31 Mar 2008 10:39:32 +0100
- To: David Carlisle <davidc@nag.co.uk>
- CC: hsivonen@iki.fi, public-html@w3.org, www-math@w3.org
David Carlisle wrote: >> The right way to do either is to run an HTML5 parser. > > I don't see how that is likely to happen while the "html parser" is > simply that, with so many hard coded rules for html elements. > If the parsing was abstracted away from html and then some schema > language was used to specify html5 in terms od that abstraction, > perhaps other languages could least consider whether they wanted to > offfer lax "html-style" parsing in addition to xml. This is essentially > how John Cowan's tag soup works. Now it may be that you've looked at > existing behaviour and decided the only way to model that is build in > special rules everywhere, if that's the case, so be it, but that > severely limits the usefulness of such a parser in a non-html context. I'm really uncertain why you think that running an HTML parser to construct an in-memory representation of the HTML in the same in memory format as that used for XML is the wrong way to import HTML content into an application that currently imports only XML. >> We can ask browsers to use the XML serialization for clipboad export >> on platforms that have pre-existing deployed XML-based clipboard >> flavor for MathML > > yes and you would also need to ask all editing systems not to generate > <math>1+2=3</math> so that what they produce could be used as mathml > without having to pass it to a browser and cut it out. The simplest way > to ensure that editors don't produce such corruption is not to imply > that it is legal in the first place. It offers very little benefit to > anyone, and massive oportunities for incompatiblity with the past and > corruption of data (where the system does not imply the element > structure the author expected) in the future. The supposed benefit is not to MathML editors but to authors using text editors. I have tried writing MathML-in-XHTML using only a text editor and the experience was painful to say the least. I found that the verbosity made it difficult to enter and then difficult to fix when I had made a mistake. The sensible solution might have been to use something like itex2MML to keep the source equations in human-readable form but that would have involved keeping two seperate representations of the document, with all the associated problems that that causes. In my experience the verbosity of MathML is a serious problem and impediment to authoring. However, I'm not sure that introducing a whole slew of rules for tag inference is the right approach. I think authors have a hard time understanding where tags can be inferred and I think, with the exception of tbody (which I think is actually a case of authors not understanding the table model enough to realise a tbody is needed), the tag inference of HTML 4 is used only by the most expert authors. Any system that allows authors to write half of their content in a tag-inferred form but requires the other half to be written out fully, according to the limitations of the inference scheme, is going to be very difficult to grasp in full. An alternative to the tag inference idea would be to make (optional) use of the wiki-serialization of MathML previously discussed. Specifically we could allow either a <math> subtree containing normal MathML but with text/html compatible error handling, or a <wikimath> (strawman name) element that took only the human-editable serialization and converted it to a MathML DOM tree in-memory. This would make predicting the DOM tree for style and scripting harder but would make editing easy enough to more than make up for those problems. > > David > > ________________________________________________________________________ > The Numerical Algorithms Group Ltd is a company registered in England > and Wales with company number 1249803. The registered office is: > Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. > > This e-mail has been scanned for all viruses by Star. The service is > powered by MessageLabs. > ________________________________________________________________________ > -- "Mixed up signals Bullet train People snuffed out in the brutal rain" --Conner Oberst
Received on Monday, 31 March 2008 09:40:27 UTC