- From: <juanrgonzaleza@canonicalscience.com>
- Date: Sun, 4 Jun 2006 06:23:57 -0700 (PDT)

James Graham wrote: > > I could go on but > at least in academic fields, LaTeX is either the only format accepted > for publication or the preferred format. In mathematics, and theoretical physics sure, in rest of science? I doubt. In chemistry, LaTeX is not preferred for example. > Note also that very very few people have the slightest interest in the > publishing process itself. They simply wish to achieve high quality > results at a minimum of effort. This means that they will not be > prepared to invest any time in learning a new language, particularly > one that is not already widely accepted (chicken and egg problem) or is > harder to use than the language they are familiar with. People learned to use typewriters, and after computers, and next text processors, and TeX and LaTeX, and email, and HTML... The key is that you learn any new tool when it is useful and solves problems. TeX-LaTeX solves a minimum subset of problems of real life and reason is not popular except in some academic communities. The only really good point of TeX-LaTeX systems is on mathematical typesetting; textual, graphical, diagrams, and others items are best done with different systems and approaches. > You may think I > am overstating this but I disagree - bear in mind that a significant > fraction of astronomical (chosen merely because it is the field I know > best) software is written in Fortran 77. For many of these people > almost 30 years of language design has never happened. If Fortran 77 fulfills the needs they have no reason for the change but if it does not fulfill then they will adopt Fortran 90, or C++, or Java, or Maple, or anything else. There are old academicians still using ordinary mail for communicating with colleagues. Is this an argument against e-mail or when designing a new communication model would we think in a subset of guys loving ordinary mail? > So, in general the people likely to be publishing mathematical content > to the internet have _no_ interest in writing their content in any > format other than LaTeX and especially not to a verbose format of the > type that fits the XML data model. I always am perplexed of double measurement scale of TeX-people. They rudely critique mathematical typesetting of programs such as MSWord. They like use the word ?unprofessional? for ranking many non-TeX systems. However, most of web pages generated from TeX-LaTeX systems are really unprofessional even at that small subset of static and boring academic webpages. People abandoned TeX-LaTeX in favor of best approaches in many places. Some weeks ago I received a draft of manuscript prepared by a mathematician and will probably be published in MSOR journal in brief. He is not using TeX or LateX because limitations and write: <blockquote> Mathematicians have been served well by TeX and LaTeX for their mathematical typesetting. Too well, perhaps. At least, if an dedicated TeXnician of the last ten years has a chance to \relax and look about himself he will see that the rest of the world has moved on in several incompatible ways to the cosy world of TeX. </blockquote> >This is why the web is liberally > sprinkled with the ugly gif output of latex2html. If we want this > situation to change, the _only_ solution is to allow LaTeX as a > document creation format. For creation of unprofessional webpages or electronic documents? Okay. Somewhat as anyone can create low quality webpages using ?save as? in MSWord, but if you want professional webpages then MSWord is not the correct tool. Similar thoughts apply to TeX-LaTeX. As an exercise let me comment ITeX output in one of your pages. I will not review your web page ?I'll go and play with words and pictures?, and I will say nothing on the quality of the rest of web design not in its typesetting. You begin from an IteX source (a dialect of LaTeX) and next present the MathML output generated. Then you claim <blockquote> It's pretty clear which version is easier to enter, read and maintain. </blockquote> Well. It is clear that IteX is easier to enter and read than MathML. But if use this as an argument in favor of IteX then let me say that ASCIIMath is still easier to enter and read. Therefore if easiner reallt matter one would discard IteX and other Tex-LaTeX approaches. However, IteX is not easier to maintain. If you are looking for basic unprofessional encoding of mathematical formulae, then IteX is okay, but if you are looking for professional encoding of formulae, IteX is not good enough and this will obligate to you to learn CSS, XSL-FO, and p-MathML for fine-tuning and maybe DOM, Javascript, or c-MathML (or even OpenMath) if you want add interactivity and semantics to your encoding. At the same time IteX is not useful for the rest of webpage content (images, links, menus, text, metadata) not for preparation of electronic scientific datuments. This will obligate to you to learn HTML or other systems. Take the MathML code you generated with your Linux/x86 binary <math xmlns='http://www.w3.org/1998/Math/MathML' display='block'> <msub> <mo>∮</mo> <mtext>loop</mtext> </msub> <mstyle fontweight="bold"> <mrow> <mi>H</mi> </mrow> </mstyle> <mo>⋅</mo> <mrow> <mi>d</mi> <mstyle fontweight="bold"> <mrow> <mi>l</mi> </mrow> </mstyle> </mrow> <mo>=</mo> <msub> <mi>I</mi> <mrow> <mtext>free</mtext> </mrow> </msub> <mo>+</mo> <msub> <mo>∫</mo> <mtext>surface</mtext> </msub> <mfrac> <mrow> <mo>∂</mo> <mstyle fontweight="bold"> <mrow> <mi>D</mi> </mrow> </mstyle> </mrow> <mrow> <mo>∂</mo> <mi>t</mi> </mrow> </mfrac> <mo>⋅</mo> <mi>d</mi> <mstyle fontweight="bold"> <mrow> <mi>s</mi> </mrow></mstyle> </math> The first trouble is that structure of MathML code is very wrong. TeX-like systems are token-based system designed for fixed layouts. Web and electronic publications are different and good structure matters. For example, good structure helps to breaking large formulae in liquid layouts and is basic when copying and pasting fragments or manipulating substructures by specialized tools. In fact, the point that TeX does not correctly structure mathematics is one reason was rejected for MathML as was for any other (SGML, XML...) mathematical or scientific markup. Second trouble is in usage of MathML entities. This can produce problems in interchange of data if receipting tool or document cannot access the DTD entities declarations. Third trouble is in modification of tree structure by addition of a <mstyle> tag. Presentational markup would be less intrusive possible and one of reasons that old <font> tag of HTML was substituted by style attribute in HTML elements. Related difficulty is on the usage of fontweight attribute, which is presentational and, therefore, to be discouraged. The use of bold attribute in math is so boring as usage of <b> instead <strong> in HTML. I have not problems with accepting that unprofessional markup if webpage was a schoolwork document generated by a 15 year-old student (somewhat as I would not obligate to a student to present me a 1200 dpi document printed with LaTeX). However, I would obligate to an academician to encode using the type='vector? attribute. Ah! and do not forget that fontweight is deprecated in MathML 2.0. The usage of special mstyle tag is boring, but I find more perplexing the redundant mrow around H token. I find the code so foolish as a mathematician would find the expression x((x-1)) when you mean x(x-1) but, moreover, there are well-known technical difficulties with redundant mrows (at least in Gecko engines). The own Mozilla organization recommends to avoid any unnecessary mrow, WS, or markup. The IteX generated markup <mstyle fontweight="bold"> <mrow> <mi>H</mi> </mrow> </mstyle> would be encoded like <mi type=?vector?>H</mi> This double error is also present in other parts of the code. Another point of disappointment is in the encoding of the differential. The differential is encoded as a simple variable d. There exist special entities defined in MathML DTD and also special Unicode fonts and the true is those special character were designed with accessibility in mind. Still, if by some reason the author wan not use the special differential character, one can easily see that differential is not and variable or identifier but a operator. Therefore, <mo>d</mo> is more accurate. The same error appears in the other integral. Again I find a redundant <mrow> around the ?free? text fragment. The code is how we can see very deficient even ignoring accessibility issues. Note that vectorial quantities are rendered in italic bold font. Many authors and some journals prefer roman font for vectors. Imagine you have 5 electronic documents containing 10 equations each one. Either you learn MathML (and then you are obligated to study three or four language even for simplest tasks) and modify by hand the 50 equations or either you modify the IteX source. Since the IteX source is presentational, you would change each \mathbf in the 50 equations (even using a macro or an automated search and replace the task wastes time). Next you would parse again the source for generating new MathML markup, which would be uploaded. I do not call that maintainable (and reason most of academic publishers do not use TeX like code in their publishing/archiving systems). I would use a more solid HTML-Math approach and a standard CSS external stylesheet. I could change the rendering of millions of pages in my site with a simple change in a CSS rule. Moreover, there are additional problems when trying to add dynamism or links to code when using IteX code. How do encode this example in HTML-Math? Well, that may be debated here but a workling possibility could be (I use MathML entities by commodity, they could be substituted by Unicode) <df> ∮<sub>loop</sub> <var class=?vc?>H</var>?<var class=?df?>d</var><var class=?vc?>l</var> = <var>I</var><sub>free</sub> + ∫<sub>surface</sub> <frac> <num>∂<var class=?vc?>D</var></num> <den>∂<var>t</var></den> </frac>?<var class=?df?>d</var><var class=?vc?>s</var> </df> that of course can be changed and improved in many ways. Note that there is more information than in the original IteX source after translated to MathML (for example I am encoding diffenretials). If main goal is simplicity of markup and one just can reply IteX results, then one could try something like <df> ∮<sub>loop</sub> <var><b>H</b></var>?d<var><b>l</b></var> = <var>I</var><sub>free</sub> + ∫<sub>surface</sub> <frac> <num>∂<var><b>D</b></var></num> <den>∂<var>t</var></den> </frac> ?d<var><b>s</b></var> </df> or still more simple (more IteX like) <df> ∮<sub>loop</sub> <b>H</b>?d<b>l</b> = <i>I</i><sub>free</sub> + ∫<sub>surface</sub> <frac> <num>∂<b>D</b></num> <den>∂<i>t</i></den> </frac> ?d<b>s</b> </df> If you compare this last version (containing same information that an IteX source) then you can see that HTML-Math is not much more complex than IteX. Look to next pairs ∮<sub>loop</sub> \oint_\text{loop} Note that the MathML entity is larger than TeX command. Using the Unicode character the verbosity of HTML-Math is the same. <b>H</b> \mathbf{H} d<b>l</b> {d\mathbf{l}} <i>I</i><sub>free</sub> I_\text{free} ∫<sub>surface</sub> \int_\text{surface} Here also main difference is because I used MathML entity. Verbosity is very close to that of IteX when using Unicode character. <frac><num>∂<b>D</b></num><den>∂<i>t</i></den></frac> \frac{\partial \mathbf{D}}{\partial t} This is by far more verbose in HTML-Math. The MathML entities add verbosity over the \partial command. Using the Unicode character <frac> <num>∂<b>D</b></num> <den>∂<i>t</i></den> </frac> Of course there is some room for improvement. For example HTML let you to avoid end tags in cases when there is not possibility for confusion. Same option was available in SGML 12083 math. Therefore, <frac> <num>∂<b>D</b> <den>∂<i>t</i> </frac> or <frac>∂<b>D</b><den>∂<i>t</i></frac> were valid markup. On any case only with fractions you find significant verbosity over IteX with none of the disadvantages of using a non-web markup. Now imagine that you want add some behavioral properties to HTML-Math, performing fine-tuning adjust of the visual rendering, or changing the visual style of a numerator by didactical motives or so on. You will find further difficulties when using a TeX-LateX-IteX source may be translated to HTML-XML. And what if I send a document? Would I send the source? The final HTML? Both? LaTeX-like conversors generate basic webpages with an unprofessional (and boring) look. I see no reason for limiting capabilities of a web markup by satisfying a subset of academicians who want not waste their time on learning best markup languages. Somewhat as HTML was not designed with LaTeX as a ?document creation format? in mind but was derived from solid and sophisticated SGML, I think that HTML-math cannot be based in LaTeX but would be based in SGML math (ISO 12083) or in XML-MAIDEN or similar approach. I think that this is so obvious that no time would be devoted to discussion. Of course, anyone is free to develop a latex to HTML Math translator if desire, so free as anyone using tables for layouts; simply note the limitations of the approach. > If, or whatever reason MathML is a poor > target language for TeX->foo converters then maybe we should talk about > improving the situation. But authors _will_not_ learn anything other > than LaTeX. They will learn if need to solve problems are not solved by LaTeX, somewhat as they learn Java if programming, somewhat as learn e-mail when communicating, somewhat as learn to use Adobe Reader for printing PDFs, somewhat as they have learned Mathematica when need to compute a 1000-term summation in a partition function. > I should say that, as far as I can tell, using LaTeX as the input > language isn't the accessibility disaster that you make out. you? Have you noted that LaTeX was ignored by Maple, Mathematica, ISO 12083, EuroMath, MathML, OpenMath... You cite, for example, ASTER. It is true that ASTER was a breakthrough time ago, but it cannot be considered to be the last word in the topic. Obvious limitations of ASTER come from the inexistence of any kind of math formulae navigation support or that it gets very complicated with complex math expressions to follow the audio output with effectiveness. Since it relies on LaTeX input, it can be ambiguous. How would ASTER read next TeX fragments? f(x-1), a(x), dy, Df, Df, Df, b^2, x^2, x^2. > directly in an XML language, the verbosity of the output language is > almost irrelevant Is it? Then why was MathML WG so worried about verbosity of fine parallel MathML markup to the point they provided an alternative _less verbose_ encoding? Juan R. Center for CANONICAL |SCIENCE)

Received on Sunday, 4 June 2006 06:23:57 UTC