Re: Basic requirements for mathematical-scientific language from juanrgonzaleza@canonicalscience.com on 2006-04-20 (www-math@w3.org from April 2006)

From: <juanrgonzaleza@canonicalscience.com>
Date: Thu, 20 Apr 2006 10:12:04 -0700 (PDT)
To: <www-math@w3.org>
Message-ID: <3025.217.124.88.238.1145553124.squirrel@webmail.canonicalscience.com>
Charles Lyons wrote:
>

[snip]

> In my opinion, every programming/scripting/descriptive language has its
advantages and its limitations.

Completely agree!

> XML is great because it provides a platform-independent easy way to
write and
> process documents, and those documents are human-readable, requiring no
other
> software to write them.

No so easy and no so readable; I am not sure that reply to last remark.

> The disadvantage is always its verbosity

There exit more disadvantages.

[snip]

> This seems a bit wierd, but my point is that XML is not suited to every
application -
> neither is MathML... MathML is designed to be a human-readable,
semantically rich
> language which is extensible [okay, you can disagree with me here, but
don't argue
> the point further]. If you have applications that require specific
constructs, then you
> can optimise those constructs, but then that language is unlikely to be
portable. It is
> extremely difficult (if not impossible) to develop a language which is
extensible,
> semantically rich and compact.

Since you do not wan I argue, I can just add that I recognize the
difficulties of the project. However, “sorry it is extremely difficult to
develop a language which is extensible semantically rich and compact, you
may use this, even if you cannot” sound unacceptable to me.

> So, in your example, you quote a 7GB piece of data; fine, why not
condense that
> down into binary format and save file space? If you know exactly how your
> formula/construct is always going to be structured, and especially if it
is only
> numerical, then why not optimise it completely rather than trying to use
an all-
> purpose language like MathML?

Well, I was already talking of optimised file size. No, I do not know
exactly how formula/construct is always going to be structured. That
varies a lot of in computational science. I do not know what you mean by
“numerical”. About your question, then the MathML goal [1.2.4 Design Goals
of MathML]

<blockquote>Encode mathematical material suitable for teaching
and scientific communication at all levels.</blockquote>

is to be forgotten.

> Back to MathML, and I had the same verbosity problem writing MathML
documents.
> Most of my work is now written in MathML, but I don't do the XML by hand
- that
> would take forever and give rise to many mistakes. Instead, I designed
my own
> software converter which takes a more condensed syntax and converts it
to MathML > Content, to which an XSLT sheet is applied to convert it to
Presentation. For example:
>
> Int[phi*Dot[grad[psi],Vector[n]],V]
>
> which gives you the integral over the volume V of the scalar phi
multiplied by the dot
> (scalar) product of the gradient of scalar function psi with the vector
n. The equivalent
> MathML syntax is incredibly long, much longer than what I just typed,
but there are
> limitations to my language as well: it's convenient for me to use, and
it produces
> MathML which is great, BUT every function has to be programmed to
convert it into > correct MathML - so Dot gets turned into an <apply>
with a <scalarproduct /> child
> etc. The limitation is that I have to program new functions when MathML
doesn't
> already have them[1], but even so this syntax is a real time saver when
writing
> documents, and in my opinion it beats TeX just because MathML has a
semantics
> content side as well.

Interesting Mathematica-like approach, I began an input syntax program
(CanonMath)

[http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]

but I abandoned recently because problems and limitations of MathML and
even of XML. Moreover, I also need scientific oriented markup. That is, to
differentiate between psi, psi, and psi.

> Anyway, as I've heard said on this list time-and-time again, if you
don't find MathML
> does/can do what you need, create something else to satisfy you - and if
you can make > it interoperable with MathML then great, but if your aim
is to produce a more
> condensed but less flexible language, then so be it.

That is reason of this thread I launched. People can learn from my ideas
and also from my errors. Some requirements (if considered good enough)
could be introduced in future MathML 3.0 other obviously cannot because do
not fit in current MathML 2.0 structure.

It would be great also if people offer ideas and comments. CanonML is
backward compatible and MathML can be seen as a subset of CanonML now;
therefore, most MathML folks and software companies would be happy.

Of course, by rewriting a piece of MathML code in CanonML document, we
recover all problems and limitations of MathML; we just gain a bit less
verbosity (a half of the encoding size or so).

> Having an interest in all things computery as I do, I would also like to
mention that
> you say "MathML is unnaturally verbose and redundant" - actually, have
you thought > about your operating system? If you're running Windows and
you're trying to do a
> large calculation, it will run slower than if you used the
microprocessor directly,
> because Windows takes many clock-cycles for its own background
processes. If you > really want to run a fast calculation, you'd want to
program directly in assembly or
> even binary instruction sets and use a processor that way... But the
code for that is far > less elegant and far more "unnaturally verbose
and redundant" than any higher-level
> language (C/C++/Java/.NET) and yet gives far superior results.

I think that most of people may understand I did mean by verbose and
redundant. Verbosity and redundancy are the reasons that same code in
MathML can be of the order of 11 times bigger (Neil Soiffer) that when
*same information* is encoded in Mathematica. Maybe I may explain that I
mean by “unnaturally”; I mean that there is not technological reasons for
that way. I have provided examples of that in this list.

The Windows comparison has been amazing. Low-level programmings languages
offer better results in time (not in size) because are close to processor
pipelines. That, of course, is _not_ the case of MathML (which needs of
additional layers for being understood by browser, for example). Precisely
the problems with MathML design are the cause of the very low efficiency
of Mozilla Firefox native support browser. Have you opened three big
documents containing lot of MathML code at the same time? I can prepare
and take a coffee before were opened.

Precisely the _verbosity and difficulty_ of low-level programming
languages was reason for the introduction of high-level programming
languages many decades ago.

Therefore one may conclude that in MathML we are really obtaining the poor
of both views: It is incredibly verbose but less efficient...

> The moral is: it isn't
> always the most compact syntax that is the best. Note also that any
software using
> XML has to run an XML parser and probably a DOM model in memory as well,
> using more resources in the process. Again, XML is not the best choice when
> performing large/long calculations.

But you compared two different things at different levels and, yes, the
most compact syntax is not always the best way. This is reason that in my
previous example I *carefully* said that encoding of integral would be _of
order_ of TeX code I wrote. I wait more verbosity than ASCIIMath or TeX in
several cases (not all). In fact, I already explained in the past why I
rejected ASCIIMath ultracompact syntax.

Moreover my previous message read “The good and concise is twice good!”

Note I wrote *good* at first; that would provide some tip on what are my
design goals.

It is well-known that XML is not the best choice for large/long taks.

[snip]

>
> Regards,
> Charles.

[snip]


Juan R.

Center for CANONICAL |SCIENCE)
Received on Thursday, 20 April 2006 17:12:17 UTC