Re: Formal query about WG role and MathML-FAQ from juanrgonzaleza@canonicalscience.com on 2006-03-15 (www-math@w3.org from March 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Wed, 15 Mar 2006 03:56:31 -0800 (PST)
To: <www-math@w3.org>
Message-ID: <3119.217.124.88.213.1142423791.squirrel@webmail.canonicalscience.com>
I think I have detected some confusion in a topic I consider of maximum
importance. For that reason, I would remark the two obvious (related but
independent) points of the CanonMath research are being mixed by others:

1) the need for going beyond usual available tools (TeX, MathML,
ASCIIMath, etc.) simply because they fail to provide solutions to some
real life problems.

2) the final implementation of the notation/syntax chosen. For example,

<CanonMath>a <fraction/> b</CanonMath>

vs

<CanonMath>a  &fraction; b</CanonMath>

vs

<CanonMath>a \fraction b</CanonMath>

vs

<CanonMath>a \fraction b</CanonMath>

vs

<CanonMath>a <fract/> b</CanonMath>

vs

<CanonMath>a <f/> b</CanonMath>

vs

...

Therefore, it has no sense that some authors rejecting the possibility of
using mixed markup and empty tags were suggesting me the use of a subset
of TeX or ASCIIMath or Mathematica or so. Above point 1) impedes to me the
of use TeX or ASCIIMath.

The main discrepancy here was about the meaning of mixed vs pure content
and textual vs XML markup, which is related to 2) but is not related to
1).

Of course, perhaps I am completely wrong here, but I simply do not know
any other way to really improve the horrible input of MathML markup
without the use of infix notation and mixed markup in XML ways.


((( ABOUT INFIX NOTATION )))

I find interesting the strong rejection that infix mixed notation such as

<CanonMath>a<fraction/>b</CanonMath>

is founding when in HTML all of us wrote

<p>this is some <b>important</b> text</p>

*instead* of redundant

<p><span>this is some</span><b>important</b><span>text</span></p>

The former case is usually named "document oriented" markup, whereas the
latter is "data oriented". MathML markup is designed to be "data
oriented", and that is the basis of its extreme verbosity. I am attempting
to design a "document oriented" mathematical markup.

The tree structure for the fragment

<p>this is some <b>important</b> text</p>

is topologically the same that for empty tags

<p>this is some <b/> text</p>

(yes, it has no sense the use of empty bs but that is not the point. If
you feel still disturbed then another example I could propose now is the
use of HTML <br/> tag into paragraphs!) which looks as a fraction in infix
notation

<CanonMath>a<fraction/>b</CanonMath>


((( ABOUT MIXED CONTENT )))

In each day practice, textual markup is used in XML data. In fact, markup as

<author>
<surname>Carlisle</surname>
<firstname>David</firstname>
</author>

is so valid, structured, and good for a XSLT programmer as is

<author>Carlisle, David</author>

whereas latter is better from an user view. The name and surname are
extracted from the <author> content via XPath rules in real life
implementations of XML technology. Therefore, the surname-firstname markup
is both redundant and against human authoring. It is more, last markup is
so CSS compatible as the first example of above, when one uses CSS into a
XSLT. Some authors prefer use an alternative markup like

<author>Carlisle<separator/>David</author>.

which is also valid. In XSLT manuals one find many examples of that kind.
Apparently some people think that one is forced to work with a pure data
markup instead of with easy mixed content was later transformed to MathML
via a XSLT or script or similar.

I see not a great difference between this proposal

<CanonMath>a<fraction/>b</CanonMath>

and standard (X)HTML

<p>This a line of text<br/>and this another</p>

If one claims first may be tagged as

<CanonMath><mi>a</mi><fraction/><mi>b</mi></CanonMath>

then one would recommend also some like

<p><ti>This a line of text</ti><br/><ti>and this another</ti></p>

for the future XHTML 2.0 specification, where <ti> means "textual
identifier" being a hypothetical counterpart for the mathml <mi> tag.

No?


((( SOME THOUGHTS ABOUT SCHEMA )))

I also find surprising the appeal to the use of Unicode or textual markup
before using empty tags. Apparently, some people feel disturbed with some
as

<tag>A<leftrightarrow/>B</tag>

and would prefer the use of (textual)

<tag>A \leftrightarrow B</tag>

or (Unicode)

<tag>A 'U02194' B</tag>

I also find disturbing the appeal to the use of character entities that is
an old technique hereditary from the SGML world via DTDs. Curiously, the
W3C Schema specification states that special characters would be
introduced via empty tags!

>From the own Schema (XML Schema Part 0: Primer Second Edition) specification

<?xml version="1.0" ?>
<purchaseOrder xmlns="http://www.example.com/PO1"
               xmlns:c="http://www.example.com/characterElements"
               orderDate="1999-10-20">
  <!-- etc. -->
    <city>Montr<c:eacute/>al</city>
  <!-- etc. -->
</purchaseOrder>

And nobody in the Schema WG worried about the tree structure

<city>
Montr
<c:eacute/>
al
</city>

introduced by the use of <c:eacute/> instead of the DTD &eacute;. Then why
so worry about possibility for markup like next?

<CanonMath>
a
<division/>
2
</CanonMath>

Some like

<CanonMath>
A +
<beta/>
= 23
</CanonMath>

looks natural since the Schema entity <beta/> is the evolution of the
current MathML DTD &beta;


((( Why you can author XHTML but cannot MathML? )))

Think during an instant in how we usually work with (X)HTML. Next, I write
a usual markup

<p>This is some text I do not need break into small tokens</p>
<p>Next, I am writing a <em>Spanish</em> word is typed via a Schema
entity: Ni<ntilde/>o</p>

I do not needs additional spans before and after the emphasized element
nor surrounded the Schema entity. My computer knows about that!

Next a hypothetical data-oriented markup version (by commodity I write
only the first paragraph of above)

<p><c>T</c><c>h</c><v>i</v><c>s</c><space/><v>i</v><c>s</c><space/>
<c>s</c><v>o</v><c>m</c><v>e</v><space/><c>t</c><v>e</v><c>x</c><c>t</c>
<space/><v>I</v><space/><c>d</c><v>o</v><space/><c>n</c><v>o</v>
<c>t</c><space/><c>n</c><v>e</v><v>e</v><c>d</c><space/><c>b</c>
<c>r</c><v>e</v><v>a</v><c>k</c><space/><v>i</v><c>n</c><c>t</c><v>o</v>
<space/><c>s</c><c>m</c><v>a</v><c>l</c><c>l</c><space/><c>t</c><v>o</v>
<c>k</c><v>e</v><c>n</c><c>s</c><c></p>

Compare both! I do not need explicitly to say to my computer (which is
very intelligent) what is each character in paragraph, therefore, the
MathML data markup

<math>
<mi>E</mi>
<mo>=</mo>
</mi>m</mi><msup><mi>c</mi></mn>2</mn></msup>
</math>

can be ***typed*** by humans as

<math>E = m<msup>c 2</msup></math>

without need for introducing redundant markup, e.g. stating that 2 is a
number! Now well, how do we introduce superindices in HTML? This way

<p> this is superindex<sup>2</sup></p>

Therefore, we would prefer the unification of all the XML technology. It
has no sense that the MathML specification states a way to superindeces
and XHTML another. It has no sense that last XHTML deprecated the use of
style tags and even attributes, whereas the MathML use <mstyle> and color
attributes, etc.

Moreover, using infix notation for easing the writing the (X)HTML way
transforms to

<math>E = m c<sup/>2</math>

the m of msup tag is also redundant, since computer know when <sup/> tags
are into <math> tags and when are in a paragraph of text. The final XML
input is very close to natural

<TeX>E = m c^2</TeX>

Curiously some people is very disturbed on the mixing of text with empty
tags as <sup/>. I repeat again that mixing is defined by W3C Schema as
natural evolution from DTD entities.

Somewhat as one can use a plugin for transformation from above TeX to
MathML, one can use a plugin or a XSLT or Javascript for transformation
from CanonMath to MathML. But CanonMath would introduce *nine* advantages
(were listed in Canonical Science today) are not present in content and
presentation MathML, TeX/LateX, Itex, or ASCIIMath.

Usually things as H would be automatically parsed into <mi>H</mi>. And
yes, I have also thought the possibility for parsing into <mo>H</mo> which
could be done via some input tag as <op>H</op> or similar. Difference is
latter would be used in _some occasions_ for fine-tunning. Note that one
is obliged to use the <mo>, <mi>, and <mn>s even in trivial cases as
numbers, "=", etc when authoring in MathML.


Juan R.

Center for CANONICAL |SCIENCE)
Received on Wednesday, 15 March 2006 11:56:49 UTC