semantic markup for math

Thomas Breuel (tmb@best.com)
Thu, 18 Jul 1996 15:50:31 -0700


Date: Thu, 18 Jul 1996 15:50:31 -0700
From: Thomas Breuel <tmb@best.com>
Message-Id: <199607182250.PAA08088@shellx.best.com>
To: www-html@w3.org
Subject: semantic markup for math

|I think what you mean is that the little context
|definition i made up on short notice doesn't cover the semantics you
|need for that formula.

That's the point: using semantic markup for math requires the semantics
of formulas to be defined for each and every field of mathematics that
wants to publish on the web.  You seem to think that that is easy.
I think it's exceptionally difficult.

The argument that MINSE is extensible doesn't help: if "semantic markup"
for math has any utility at all, it requires that people agree on
the semantic markup, not make it up on the fly.

|That's akin to complaining that we force poets to write their poems
|themselves.  Authors will author; they will not have to "typeset"
|unless they want to have precise typesetting control.  Author says
|"integral", you get an integral; author does not have to say "draw
|a stretchy integral symbol from this symbol font, centered on this
|line, vertical height matching this box, if in graphics mode; or
|say "integral with respect to" etc., if in speech mode; or..."

Authors and poets usually don't do semantic markup at all.  They often
use pen and paper, which is all about layout and has no structural
information whatsoever.  Furthermore, authors of mathematics are often
very particular about the rendering of their formulas: they don't want
just some representation of a formula, they want the particular
representation, bold-facing, spacing, and subscripting that is common
in their subfield, at their university, or used by their teachers.

By adding semantic markup, you are adding a completely new set of
requirements and burdens to authors.  

|> there is a strong risk that browser vendors
|> won't implement it and that users won't or even can't use it.
|
|Browser vendors haven't taken a stance yet.  And i honestly
|can't imagine an HTML author who is incapable of writing
|
|    <se> 'sin(2*A) = 2*'sin(A)*'cos(A) </se>
|
|when she wants to express a trig identity.

By "can't" I'm referring to the fact that the semantic primitives for
the user's mathematical specialty are missing, not that the
user is too stupid to figure out how to.

|> Unlike MINSE, I actually can typeset even
|> the formula on the cover of my freshman year math textbook with it.
|
|Like i said, i don't think this is true.  What's the formula?

\int_{\partial D^+} \boldomega = \int_{D^+} d\boldomega

|Subject: Look: real math!
|Go visit http://www.lfw.org/math/ams-example.html

That's an interesting example, for several reasons.  First of all, it
isn't really semantic markup, but a strange mix between semantic markup
and layout.  For example, you couldn't automatically tell from the
notation which variables are scalars and which ones are vectors, you
couldn't automatically translate the integrals into some other common
notations, and you couldn't translate the formulas into FORTRAN
notation either (something that Macsyma's and Mathematica's structural
notations give you).  With your other web pages, I was assuming that
you simply hadn't finished fleshing out the details.  If you present
this as an example of "semantic markup for real math", I can only say
that your semantic markup isn't very semantic after all but a kind of
variant of LaTeX notation that uses different quote characters and
function call notation instead of infix notation in some places.

Second, like all the examples in MINSE that I was able to find, it
comes from a particular (though relatively common) branch of applied
and engineering mathematics, not what I would call "real math".

|>      -- large amounts of existing, on-line math is not in a
|>         structural representation and cannot be converted
|>         automatically
|
|This is only the case if it doesn't contain sufficient information
|for deployment on the Web in the first place.  If the original
|source contains the information necessary to present it on the Web,
|then it follows that it is unambiguous enough to convert.

I just don't see how you can say that.  LaTeX formulas certainly
contain enough information for "deployment on the web".  They can be
rendered on different output devices, scaled, and linearized and read
out.  Given that eqn could be rendered approximately as ASCII, I
suspect that LaTeX style formulas can be as well.  If you don't like
the simple linear rendering the markup itself gives you, you can add a
textual alternative.  They are also general purpose enough to typeset
most of mathematics, with a fixed set of primitives.

The HTML 3.0 math specs seem like a reasonable, pragmatic approach to
typesetting math on the web.  I can pull most of my Yellow books off
the shelf and render and publish the formulas in them using it.  They
are intuitive for anybody who has used LaTeX (or eqn) before.  I don't
see significant additional utility that comes from more semantic markup
for web publishing, and I don't think you have given a compelling
argument yet for it (or a convincing specification).  On the other
hand, defining and deploying semantic markup would be a huge
undertaking, and I fear going down that path would put standardization
of any kind of mathematical markup for web documents on indefinite
hold.

Thomas.

PS: As a mathematics typesetting system, MINSE is quite a nice
piece of work.  But please let's keep that separate from the issue
of what kind of markup for math is best for mathematical publishing
on the web.