- From: <juanrgonzaleza@canonicalscience.com>
- Date: Fri, 26 May 2006 06:58:41 -0700 (PDT)
I have read with great interest this program and I would recommend reconsideration of the role of mathematical markup in HTML5. But I would first explain a my position. Initially, I began believing that web authoring was "save as" command in Mword. Next I begin to work with a real HTML tool and discovered the XML world just next. Then I, seeing all hype around the new exciting technology, decided to generate a pure XML website: XHTML for text, CML for chemistry, MathML for maths, SVG for graphics, XSLT for programming, XSL-FO for style... Big mistake! Moreover due to difficulties on implementation of young technologies, cross-browser (in)compatibility and so on I focused on XHTML+MathML and used CSS as a first styling language (because browsers do not support FO today I said). More errors! After I learned DOM and JavaScript because something so simple as a drop-down menu cannot be done in XSLT (it is not for dynamical pages). Now I learned XSL-FO and even if tomorrow was implemented in browser I would *not* use FO. It is ugly and inefficient! Similar thoughts about SVG. I proved and quickly abandoned. I see with terror how people has been criticizing "old" table-layout gigantic pages filled with presentational <center>, <b>, and <font> whereas now it is "in" to server to clients a giant SVG archive with lot of presentational tags simulating tables, paragraphs... More headaches became from the XHTML part, specially incompatibilities with browsers and search engines, the nightmare of MIMEs, and others. Finally I abandoned... But biggest error was try to use MathML. MathML is full of incorrect design options and technical holes! Even some MathML author recognizes that content MathML was not "well thought" due to lack of agreement on the committee. The failure of HTML math was not because lack of interest in mathematics or because HTML cannot represent math. The failure was because design of HTML-Math was able to join the poor of TeX with the poor of SGML being unusually rejected. The W3c did a poor work with HTML-Math and also with MathML 1.x and last 2.0. ************************** Some problems with MathML: 1) Insanely complicated and inefficient. In some cases, I have computed 15 times more bandwidth and server storage when using MathML than alternatives. 2) Not fully compatible with other basic technologies such as CSS and DOM. I find interesting that just after ten years and many specifications the MathML WG begins to ask what would be changed in MathML for CSS friendly! Also the MathML WG has clearly stated that no backward incompatible changes will be done to future MathML 3.0. End of history. Position paper for HTML 5 says <blockquote> Web application technologies should be based on technologies authors are familiar with, including HTML, CSS, DOM, and JavaScript. </blockquote> Well MathML is not really based in those. But we can render math using just HTML, and CSS, and we can use JS and DOM in the same way we use in HTML or CSS for text. Look XML-MAIDEN [http://www.geocities.com/csssite/index.xml] for ideas, samples, etc. And adds: <blockquote> Basic Web application features should be implementable using behaviors, scripting, and style sheets in IE6 today so that authors have a clear migration path. Any solution that cannot be used with the current high-market-share user agent without the need for binary plug-ins is highly unlikely to be successful. </blockquote> Well precisely MathML violates that. George is preparing a cross-browser CSS also working in MSIE. MSIE does not provide native support for MathML because difficulties for unification with rest of DOM and rendering engines; they prefer external plug in, somewhat as Opera browser developers rejected native support for MathML before break the browser, whereas Firefox uses an external module built-in by similar motives. The points "Users should not be exposed to authoring errors" and "Device-specific profiling should be avoided" are also violated by MathML. For example, rendering of MathML in Firefox is based in specific fonts may be downloaded and installed. This has been disapproved. One of problems with this approach is that once new STIX fonts available I can use them in HTML, also in CSS rendering of math, but I cannot use them in firefox, since MathML module would be rewritten, and the full engine recompiled, obligating to users to download and install new versions of browser for new fonts!!! 3) Incompatible with other markup models. For example, superindices are encoded in different ways in XHTML that in MathML, you would use style="" attribute for changing font or colour of a token in XHTML but may use the <mstyle> tag in MathML, etc. In XML-MAIDEN you use <sub> and <sup> and style="" in the same way that in HTML. 4) The default printing of MathML is not good and people is returning to TeX for that! 5) Accessibility is very deficient in most of cases because people is not using invisible operators not the correct number of <mrow>. Accessibility is better with the old HTML+GIF+ALT models! Aural renderers of HTML could be easily adapted to HTML math. Only content MathML 2.0 is designed for accessibility (in theory) but support in current browsers is zero. Moreover, the situation is still poor than that! Many sites claiming theoretical accessibilities (e.g. Distler blog) are serving (ds)^2 as <mi>d</mi><msup><mi>s</mi><mn>2</mn></msup>, i.e. 2s ds!!! 6) There are problems with default rendering of entities and with usage of invisible operators. Accessible code render ugly in screen whereas visually correct code being inaccessible. This could be corrected with HTML math and proper usage of CSS for selecting rendering (e.g. italic vs. roman). Take the case of x = 10 m. In HTML I could use <var>x</var> = 10 m or even some <span class="unit">m</span> (which is a similar approach to other Markup models for scientific units as STMML) and I could add CSS rules if needed. How this is encoded in MathML? The w3c technical note on units says <blockquote> Unit symbols are written in roman (upright) type, are not altered in the plural, are not followed by a period except at the end of a sentence, and no space is left between a prefix and a unit symbol. This is accomplished in MathML by using the mi element. Single character symbols must be qualified by setting the mathvariant attribute to normal as otherwise they would be italicized. For example, <mi mathvariant='normal'>m</mi> </blockquote> Yes, I find so odd the usage of mathvariant as you guys find usage of a hypothetical <span textvariant='italic'>x</span> instead of <var>x</var>. 7) The possibility for automated searches of math continues being largely a myth. I can search E=mc2 in Google today when formula is encoded in HTML 4 and it works reasonably well, but how would I search the formula in a MathML search engine? <mi>E</mi> <mo>=</mo> <msup> <mrow> <mi>m</mi> <mi>c</mi> </mrow> <mn>2</mn> </msup> or maybe <mi>E</mi> <mo>=</mo> <mrow> <mi>m</mi> <msup> <mi>c</mi> <mn>2</mn> </msup> </mrow> or maybe <mrow> <mi>E</mi> <mo>=</mo> <mrow> <mi>m</mi> <mo>⁢</mo> <msup> <mi>c</mi> <mn>2</mn> </msup> </mrow> </mrow> or maybe <mi>E</mi> <mo>=</mo> <mi>m</mi> <msup> <mi>c</mi> <mn>2</mn> </msup> or maybe <mi>E</mi> <mo>=</mo> <mi>m</mi> <mi>c</mi> <msup> <mrow/> <mn>2</mn> </msup> or maybe <mi>E</mi> <mo>=</mo> <msup> <mi>mc</mi> <mn>2</mn> </msup> or maybe <mi>E</mi> <mo>=</mo> <mi>m</mi> <msup> <mrow> <mi>c</mi> </mrow> <mn>2</mn> </msup> ... I have seen almost all of these codes being generated by real presentation MathML tools. And note it is a simple E=mc2!! 8) Visual rendering is not incremental as in CSS. This can offer us problems with large documents or even with server failures. I find just curious the w3c emphasis on abandoning non incremental rendering of old HTML presentational table layout models in favour of CSS layouts, whereas forcing usage of a non incremental MathML presentational markup. Some mathematical documents take order of 10 minutes before rendering in Firefox. 9) MathML rendering does not fit with user preferences as CSS does. 10) Advantages of being using a "standard" vanish when one observes the infinite malleability of mathml code. For example people is simulating tensors with nested msup, msub, msubsup, and tricky mrows, instead using <multiscript> and <none/>. Then hypothetical standardization advantages are lost. In HTML math one would reuse <sup> and <sub> and maybe some other tag for a full representation of *any* tensorial structure. 11) p-MathML is not good enough for rendering math in browsers. Luca Padovani writes, <blockquote> A quick analysis of the MathML markup reveals that there is no way to preserve the structure of the formula and still have a "correct" rendering at the same time. </blockquote> 12) The use of presentational markup is contrary to common sense. I write <H1> in HTML and next I said one -and only once- in a CSS how the heading may be rendered in my doc. That CSS, when stored externally, can be called by billions of others HTML docs. In MathML you are forced to repeat presentation in each formula in each document, to use mstyle... The use of a presentational language for mathematics remember me the old days of the <font>, <b>, <i>, <center> tags. Little impact of MathML in the web remind me the failure of XSL-FO to conquer the web. Instead specific presentation MathML markup complemented with lot of <mstyle> tags I would prefer semantic or structural markup. Here there is a kind of general confusion. MathML authors believe that <apply><divide/><ci>b</ci><cn>2</cn></apply> may be the only way to content oriented -note that really we are encoding <divide></divide> as first child-. Whereas <mfrac><mi>b</mi><mn>2</mn></mfrac> is presentational MathML. However, one could copy the standard ISO 12083 and write <frac><num>b</num><den>2</den></frac> this is not presentational markup. It is structural. You are encoding fraction and its structural elements numerator and denominator. This is similar to splitting of html documents into structural <head> and <body>. Next structural elements <frac>, <num>, and <den> are rendered via CSS rules. Modifying CSS rules you modify the presentation (heights, line style, font sizes, colours, etcetera), somewhat as you modify the presentation of headings in the same way. ************************************** Proposals (from less to more radical): A) Eliminate next text from specification "Authors are encouraged to use MathML for marking up mathematics" because authors would use more concise powerful and solid markup for mathematics. B) Add special math attribute can be used in structural markup. For example math="num" would be equivalent to class="num", but using math= specific attribute. You can also use the class attribute for other tasks. This could be solved if space attributes (2.2.7) are implemented. C) A more complete approach is providing a set of structural and/or semantic tags for usage with HTML5. This would close the cycle when HTML was designed as a small, light-weight, non-proprietary, easy-to-use document format designed for the publication and distribution of scientific documents One needs little tags, because <sub>, <sup>, <var> and <table> can be reused. The number of new tags and usage would be debated considering different proposals available (XML-MAIDEN, ISO 12083...) but here some illustrative examples: <math>, <frac>, <num>, <den>, <root>, <scripts> and maybe two or three more tags would be sufficient. Note that <frac> can be reused in text for introducing inline fractions usually simulated as <sup>2</sup>/<sub>5</sub>. Note that <scripts> could be used in text mode for typography (composed diacritics for example). Implementation would be cheap because the model is backward compatible with existing CSS, HTML, and DOM technologies. Moreover, visual rendering is almost solved and one would simply implement some of available CSS (e.g. George one) by default in browsers (somewhat as <h1> are already rendered by default). Juan R. Center for CANONICAL |SCIENCE)
Received on Friday, 26 May 2006 06:58:41 UTC