RE: pages with MathML from juanrgonzaleza@canonicalscience.com on 2006-04-15 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Sat, 15 Apr 2006 08:21:06 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3310.217.124.69.238.1145114466.squirrel@webmail.canonicalscience.com>
Romeo Anghelache wrote:
>
> juanrgonzaleza@canonicalscience.com wrote:
>> Romeo Anghelache,
>>
>> It is rather surprising that one can claim that HERMES is generating
>> semantic content, when articles generated from HERMES looks like
>>
>> --------------------- REAL CODE
>> <…>
>> <p>
>> </p>
>> <h3>2001-07-09</h3>
>> <p>
>> </p>
>> <p class="abstract">
>> <p>
>> <span class="fn"> </span><span class="fb">Abstract </span><span
>> class="fn">We review the present status of black hole thermodynamics. Our
> ....
>
>> some unresolved open issues. </span>
>> </p>
>> </p>
>> <p>
>> <…>
>>
>> -----------------------------------------------------------
>>
>> Is the use of empty paragraphs for simulating layouts, headings of level 3
>> for encoding dates, and others points you mean by "semantic"?
>>
>
> you didn't read the user manual, a single page, at  http://hermes.roua.org/

Let me introduce the heading of the single page you are citing

<quote>
Hermes - a semantic XML+MathML+Unicode e-publishing/self-archiving tool
for LaTeX authored scientific articles
</quote>

with Last update on Friday, March 31, 2006  by Romeo Anghelache.

i ask again is above code you mean by "semantic"?

> The document generated by Hermes is a raw XML file (the reference
> document, or the library document). What you are talking about here is
> the result of a stylesheet transformation, a stylesheet that I wrote
> just to put the things on screen.
> The only semantics on the screen I'm concerned about is the looks of it,
> and the fact that you can copy/paste the math in your math application.
> The h3 you're complaining about may have been class="date", but it has
> its unintentional usefulness: it catches the know-it-all guys.

And do you consider correct the encoding authors or dates as headings of
level 3? It is rather incompatible with w3c focus of last decade. The w3c
has done a big effort on recommend splitting of content from presentation.
The use of presentational tags as <i> instead of <em> is not encouraged
since years ago. The encoding of authors or dates as headings of level 3
is still poor!

You are claiming *now* that incorrect code is not HERMES code, just a
stylesheet that you (personally?) wrote. Ok, let us suppose that I was
completely wrong and failed to understand.

1) The incorrect encoding continues there, and if it is not error of
HERMES -just of the "personal stylesheet"- the final code is being served
to final users (including people with disabilities) continues being wrong.

2) One of reasons that I notice authors when I am critizing/revieving
their work is that if I am wrong they can correct me. You did not reply
then and now are replying in a hard way here.

3) Was I wrong?

Let me cite another page, titled *Hermes at work*

[http://hermes.aei.mpg.de/]

and next to quote a bit

<quote>
This page lists results (or links to results) of Hermes assisted
conversion of scientific articles/books from the (La)TeX world to the XML
world, it is an online storage facility kindly offered by Max Planck
Institute for Gravitational Physics.
</quote>

Then, next I click on the first living review (Rovelly paper)

[http://hermes.aei.mpg.de/1998/1/article.xhtml]

and since I am curious, I see the source code once my browser opens the
document and from the <head> section I extract:


<meta name="generator" content="Hermes, version 0.9.4 2005-11-19, license
GNU GPL, description http://hermes.roua.org/"/>


therefore, I can perfectly to say on Canonical Science Today

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

that document, where layout is done with <p></p>, authors or dates are
encoded as <h3>, and the structure of abstract is discussible, ***was
generated by HERMES***. In fact, I wrote even the version of HERMES
software generated the documents I was reviewing

<quote>
[See The Thermodynamics of Black Holes by Robert M. Wald in Living Reviews
in Relativity generated with Hermes, version 0.9.4 2005-11-19]
</quote>

Therefore, if there was some error it was not from my part. Either that
obtuse code was generated by Hermes or the metadata of document I cited is
wrong.

> just to put the things on screen.

>> Do you name “semantic” the next encoding generated by HERMES
>>
>> <h3><a href="http://surubi.fis.uncor.edu/reula">Oscar A. Reula</a></h3>?
>>
>> Uff! Author encoded as heading of the document!
>>
>
> no I don't. my guess is you've just heard a voice.

I do not think so, I simple read in *Hermes at work*

[http://hermes.aei.mpg.de/]

This page lists results (or links to results) of </link>Hermes</link>
assisted conversion [...]

Then i followed the link to [http://hermes.roua.org/] and I can read in
the top of the page

<quote>
Hermes - a semantic XML+MathML+Unicode e-publishing/self-archiving tool
for LaTeX authored scientific articles
</quote>

I sincerely think that anyone would obtain the same conclusion I obtained
that Hermes -proclaimed semantic e-publishing/self-archiving tool- is
encoding authors or dates as headings of level 3 and use <p></p> for
layouts.

Both Hermes pages I cited are updated by a so-called Romeo Anghelache
(i.e. you).


>> Moreover, the mathematical code presents in the articles generated by
>> Hermes are not verifying accessibility, structure is far from good, and
>> several equations are rendered via “tricks”.
>>
>> For example, in “Hyperbolic methods for Einstein’s Equations”
>>
>> [http://hermes.aei.mpg.de/1998/3/article.xhtml]
>>
>> one reads (before equation 2):
>>
>> \epsilon _{abcd} is the Levi-Civita tensor corresponding to the physical
>> metric
>>
>> The underlying math is not encoded via tensors but
>>
>> <math xmlns="http://www.w3.org/1998/Math/MathML">
>> <msub>
>>  <mrow>
>>   <mi>&epsilon;</mi>
>>  </mrow>
>>  <mrow>
>>   <mi>a</mi>
>>   <mi>b</mi>
>>   <mi>c</mi>
>>   <mi>d</mi>
>>  </mrow>
>> </msub>
>> </math>
>>
>
> again, you didn't read the user manual.
> at least have the courtesy to read and understand the minimal info
> before bugging this list with off-topic comments.
>
> I'll spell it to you again,
> quote from http://hermes.roua.org/ :
>
> Of MathML, only MathML-presentation is generated if Hermes is used to
> translate legacy LaTeX files (here, by legacy LaTeX files I mean sources
> which were not edited with semantic vocabularies in mind) without manual
> intervention on the source.
>
> unquote

Well, you demand courtesy but you are assuming (twice) that I didn’t read
manual, claiming off-topic comments and adding another personal attacks.

I am ignoring any personal attack from you. This thread is about pages
containing MathML and I am proving with real examples that mathML code is
being served in that pages is wrong (therefore is on-topic).

1) I never said that HERMES was generating content MathML.

2) The MathML code is being generated and served to the Internet continues
being wrong, is not accessible, and the structure of math is incorrect.

3) Presentation MathML 2.0 has a specific tag for encoding tensors. You
are just visually simulating tensors via a msub tag. That is not better
than using old HTML for visual simulating tensors or better than using
<center><b> for simulating headings...

4) Any possible advantage of using MathML (accessibility, structural
markup, etc.) is broken in practice with real-word examples as those I am
citing here.

5) The code generated simulates tensors instead of encoding the tensor.
Any practical advantage of using standards vanishes when each guy encode
math how he want|prefer|can.

For example, the advantage of using standard <mfrac> for fractions is
*lost* if a guy uses <mfrac> for a/b, other uses <mfrac> and redundant
<mrows>, other simulate fractions using mtable, other uses a mixture of
XHTML more MathML. One exemplar of last could be

<span class=“num”>
<math xmlns="http://www.w3.org/1998/Math/MathML" display=“inline”>
<mi>a<mi></math></span><span class=“den”>
<math xmlns="http://www.w3.org/1998/Math/MathML" display=“inline”>
<mi>b<mi></math>
</span>


Above points may be your vision of mathematical markup. But and maybe you
are lacking understanding on this important point, nobody here is saying
that you were a bad programmer. We are simply saying that MathML code is
being served in the Internet is ugly and that in practice theoretical
advantages from using standard mathematical XML markup (structure, copy
and paste, accesibility, standarisation...) are lost.

Please do not worry if we are rejecting HERMES project and MathML. They do
not fit our needs!

>
>> <span class="fi"> </span><span class="fn">is the Levi-Civita tensor
>> corresponding to the physical metric, </span>
>>
>> Sorry, but I cannot call that "good code", because the Tensor is being
>> rendered via a ***visual*** forcing of subscripts instead via multiscript
>> tag of MathML 2.0
>
> very well, implement a tool which does it (google for Levi-Civita, find
> out it's a tensor, and the first or n-th symbol, or group of symbols,
> should be interpreted as a tensor).
> but you already proved MathML sucks all-together, why bother?
>
> I didn't ask you to call it "good code", really.

Aha! Then the objective is approximate rendering of formulae and google...
Any markup (including “tricky”, incorrect, etc.) may be permitted, is
that? I wonder if that was the goal of MathML WG, but I think that was
not, because then they had not introduced so many tags.

>>
>> And what about the redundancy of MathML ½ in equation 2? and what about
>> the "terrorific" code of equation 3?
>>
>> Do you name “semantic” content to encoding of “integral on s” like
>>
>> <mo>&#8747;</mo><mi>d</mi><mi>s</mi>?
>>
>> (equation 10 of [http://hermes.aei.mpg.de/1998/1/article.xhtml])
>>
>
> Ok. The only wrong thing here is <mi>d</mi>. Got it?
> No? Uff. It's mathml presentation, and d is an operator so it should be
> surrounded by <mo>d</mo>.
> This can be fixed, thanks for the unintentional pointing to a Hermes bug
> that I knew already.

Sorry to say this, but you are very wrong on those topics.

The integral would be rendered in presentational MathML as something very
close to

<mrow>
<mrow><mo>&#8747;</mo><mrow><mo>&DifferentialD;</mo><mi>s</mi></mrow></mrow><mrow>(Integrand
here)</mrow>
</mrow>

Both the MathML code generated by HERMES

<mo>&#8747;</mo><mi>d</mi><mi>s</mi>

and you recent proposal

<mo>&#8747;</mo><mo>d</mo><mi>s</mi>?

simply are verbose copies of old HTML (or similar)

<span>&#8747;d<i>s</i></span>?


Accessibility, audio rendering, and structural encoding of kind of output
you are encouraging are wrong.

The MathML WG explicitly said why one would use <mo> tags for that

<quote>
automatic semantic interpretation of MathML presentation elements
is made easier by the explicit specification of such operators.
</quote>.

In practice, that XHTML article is misusing MathML rather than using it.

>
>> Do you consider correct the l_Planck of equation (24)? Do you know for
>> what was <mtext> designed?
>
> the l_Plank? I don't see any l_Plank there. check your spelling.
> and yes, I know "for what was <mtext> designed" I even use it, but did
> you? Where? (please don't answer)

I do not need to check my spelling, thanks. I wrote Planck, therefore is
not surprising you are not found any "Plank". I did not in my previous
message but now I will write the MathML fragment generating the last part
of equation (24) in HERMES output

[http://hermes.aei.mpg.de/1998/1/article.xhtml]

  <msubsup>
   <mrow>
    <mi>l</mi>
   </mrow>
   <mrow>
    <mi>P</mi>
    <mi>l</mi>
    <mi>a</mi>
    <mi>n</mi>
    <mi>c</mi>
    <mi>k</mi>
   </mrow>
   <mrow>
    <mn>2</mn>
   </mrow>
  </msubsup>

which can be compared with

  <msubsup>
    <mi>l</mi>
   <mtext>
    Planck
   </mtext>
    <mn>2</mn>
  </msubsup>

The encoding, structure, and both visual and aural (accessibility)
rendering of HERMES generated fragment are wrong. Moreover, there are
redundant <mrows> in the HERMES generated code that [i] add more verbosity
to verbose code, [ii] complicate the DOM, and enlarge size memory
requirements [iii] could generate unexpected errors and bugs in
interchange of data between applications (I will write about that in a
future).

>>
>> And what about the equation (25) of
>>
>> [http://hermes.aei.mpg.de/2005/2/article.xhtml]?
>>
>> The Gamma *there* is a tensor, but is encoded as subscript ab and
>> superscript j with several redundant mrows.
>>
>
> Again: Mathml-presentation doesn't know about tensors. So you're again
> confusing things badly.
> The redundant <mrows> proved themselves necessary when the automatic
> conversion of legacy LaTeX files is an issue. You'll discover that when
> you'll convert articles yourself with the canonical science.

Of course, redundant mrows are not needed! But I am aware they are here
because the software you designed is doing bad things. It is a trivial
task to introduce an optimization layer after the conversion step
searching mrows with a single child and restructuring the code. Even could
be done in a few lines of XSLT.

Maybe in a next version of HERMES ;-)

About your statement "Mathml-presentation doesn't know about tensors", i 
will just quote the own MathML 2.0 specification. The section 3.4.7 is
titled "Prescripts and Tensor Indices (mmultiscripts)"

The HERMES generated code

<msubsup>
   <mrow>
    <mo>&#915;</mo>
   </mrow>
   <mrow>
    <mi>a</mi>
    <mi>b</mi>
   </mrow>
   <mrow>
    <mi>j</mi>
   </mrow>
  </msubsup>

is a tricky (visual) rendering instead of using MathML tensors encoding

<mmultiscripts>
<mo>&#915;</mo>
<mi>a</mi>
<mi>j</mi>
<mi>b</mi>
<none/>
</mmultiscripts>

The structure, semantics, and accessibility are wrong in the HERMES
generated fragment. This is reason that MathML folks introduced a special
markup for tensors in MathML, because it is trivial to notice that tensors
and prescripts can be _simulated_ via combinations of <msup>, <msub>,
<msubsup>, and <mrow>.


On any case, this thread is not about how excellent or bad programmer you
are or what are the limitations of LaTeX as inpuj syntax or how people is
typing source codes. This thread is not about nothing of that!

As its name suggest, this thread is about "Pages with MathML" and its
objective is illustrate (via real code as that generated by HERMES) how
MathML is being used in practice and how all theoretical advantages over
other previous markups are being lost.

>> Is that you call good semantic content?
>>
>
> Yes.

Uff!

>
> MathML-presentation is a layer of semantics, albeit minimal, but solves
> a lot of issues with publishing math on the web, some of them being:
> - converting the whole Living Reviews into XML+GIFs for mathematics,
> takes about 24 hours; converting it into XML+MathML-Presentation takes
> 10 minutes;

and converting the whole Living Reviews into sculpted stone may take more
than 24 hours, but converting it to others may take less minutes still.
And using another approach (as XML-MAIDEN) the conversion takes 0 minutes.

> - the resulting size is in favor to XML + MathML, especially when you
> have a lot of math;

compared with GIFs or stones? Sure! But the size of HERMES MathML

<msubsup>
   <mrow>
    <mo>&Gamma;</mo>
   </mrow>
   <mrow>
    <mi>a</mi>
    <mi>b</mi>
   </mrow>
   <mrow>
    <mi>j</mi>
   </mrow>
  </msubsup>

is larger than correct MathML specification encoding

<mmultiscripts>
<mo>&Gamma;</mo>
<mi>a</mi>
<mi>j</mi>
<mi>b</mi>
<none/>
</mmultiscripts>

and latter being more larger that using other available mathematical
approaches: ISO 12083, XML-MAIDEN, or others.

If you simply want visual simulation of tensors using weightless input,
and you are not worry about incorrect structure, aural rendering, or
semantic, then you can use ASCIIMath. Enter the ASCII input syntax

Gamma_(ab)^j

or the TeX/LaTeX one

\Gamma_{ab}^j

and the output will be

<msubsup>
<mo>&#915;</mo>
<mrow>
<mi>a</mi>
<mi>b</mi>
</mrow>
<mi>j</mi>
</msubsup>

without the redundant <mrow> of HERMES output ;-)

One of goals of CanonML is encoding top-research math and that cannot be
done in MathML (therein, I abandoned the previous CanonMath input sintax)
due to unusual verbosity (size).

> - copy/pasting in your math application is a huge step forward from the
> GIF based math, or current math rendered in PDF.

Also available in others approaches! Moreover, the correct MathML output
(using </mmultiscripts>) is more friendly copy/paste than HERMES generated
one. How can I select Gamma and j or just the indices a and j? Yes,
grouping is not the correct in HERMES generated output because that is not
a real tensor encoded via MathML tensor tags, The HERMES output is just a
simulation with a superscript j and a grouped subscript ab.

> These advantages make it worth the trouble of converting them even if
> there are temorary bugs left, or temporary incoveniences (which should
> be pointed out, thanks for all it's worth).
>

Temporary?

>> And what about the metric equation just after the section 2.1? This is one
>> of my favourites: accesibility, structure, "semantics", encoding, and
>> rendering are all wrong.
>>
>
> your comments are all wrong, more or less.
>
>> One find a line element ds^2. If my math is correct ds^2 = (ds)^2 but the
>> code appear in the journal article generated via HERMES is
>>
>> <mi>d</mi>
>> <msup>
>>  <mrow>
>>   <mi>s</mi>
>>  </mrow>
>>  <mrow>
>>   <mn>2</mn>
>>  </mrow>
>> </msup>
>>
>> That is, d{s}^2 (or 2s ds), which is VERY different from (ds)^2 is
>> supposed to be encoded via your "semantic" approach.
>>
>
> this is the TeX source of that expression: ds^{2}.

And the code continues being a complete nonsense. And then you verify me
that *in theory* MathML was oversold as semantic, content oriented,
structural, first-quality both printing and rendering, searchable,
accessible to people with disabilities, but in *practice*, the HERMES
output

<mi>d</mi>
<msup>
<mrow>
<mi>s</mi>
</mrow>
<mrow>
<mn>2</mn>
</mrow>
</msup>

is being served on the Internet is just an ultra-verbose version of the
more correct and old HTML

<span>ds</span><sup>2</sup>

or of the ISO 12083 mathematical markup

<subform>ds</subform><sup>2</sup>.

> Let www-math know when you'll implement a tool which will be writing a
> different MathML-presentation from that source. But wait, that will be
> wrong, I can tell you that already.
>
>
>> and all that even ignoring that one would type the differential using the
>> MathML entity instead of identifier "d".
>>
>
> missing the point again. I'm tired of repeating myself:
> there's no reliable way to infer what "d" means (identifier or operator)
> unless the author marks it up accordingly.

An since I newer said that was way, you are inventing that.

I simply reflected the MathML code is being generated and served on
real-word below an atmosphere of being cool...

Any promise of structural correctness, accessibility or so is lost in the
large run. In many ways, MathML is doing poor those old alternatives.

Take the case of the New York Journal of Mathematics (March, 2006)

[http://math.albany.edu/math/demos/nyj/]

They explain reasons that XHTML+MathML is (they believe) preferable for
online use:

The third point reads:

“XHTML+MathML is a recommendation of the World Wide Web Consortium that
complies with the standard Guidelines for Accessibility.”

I wonder what “accessibility” is providing code such as that from HERMES

<mi>d</mi>
<msup>
<mrow>
<mi>s</mi>
</mrow>
<mrow>
<mn>2</mn>
</mrow>
</msup>

for encoding (to say!) “the square of the differential of s”.

Even accessibility of old HTML+GIF+ALT model is better than above real
2005 published MathML code!

In no doubt, George’s XML-MAIDEN is far from better than I am seeing in
the real world. I wait that CanonML can be better still since I am using
his knowledge as base for further development!

[snip]


Juan R.

Center for CANONICAL |SCIENCE)
Received on Saturday, 15 April 2006 15:21:21 UTC