[Prev][Next][Index][Thread]

report of SGML Math workshop



I've found the report below on the Web server of the ISO TC46/SC4/WG6
(owner of the site: Eric van Herwijnen). For your convenience I've 
removed the HTML tags and mailed it you.

I'm getting a little bit concerned with the lack of progress in
this group and also with the impression Dave has given during the
Illinois workshop. *Are* we working on something here or are
we not?

Nico

------------------------------------------------------------------------
Notes on SGML Math Workshop

Held as part of the University of Illinois Digital Library Initiative
(UIUC DLI) University of Illinois at Urbana-Champaign

May 1, 1996

Written up by Evan Owens, Electronic Publishing Manager, Journals
Division,University of Chicago Press

The workshop was very well attended; discussion was lively.  The plan
of the meeting was presentations in the morning and early afternoon
followed by an extended period of group discussion.

As I understand it, the reason for calling the workshop was that the
UIUC Digital Library Project was having difficulty getting the SGML
Math supplied by the publisher partners to be rendered effectively on
screen by SoftQuad's Panorama.  So the immediate goal was to solve
that problem.  The University of Chicago Press is not a participant in
UIUC DLI project, so it is possible that I may have misstated some of
the background.

PART 1 - PRESENTATIONS
(1) Stephen Wolfram, inventor of Mathematica

The relevance of Mathematica to SGML Math becomes apparent in the
later discussion.  Wolfram hosted lunch and dinner for the attendees.

Wolfram Research has spent 5 years and millions of dollars to develop
typesetting capabilities of Mathematica.  They believe that it can now
typeset math to the level of the best commercial system; its page
layout capabilities are comparable to WordPerfect or Word and not
quite as good as Pagemaker.  Unlike TeX, Mathematica can gracefully
break formulas into lines without operator intervention because it
understands the structure of the math; if it didn't understand the
structure, it couldn't calculate.  Also it knows how tightly the
operators are bound.

Input is palette, keyboard, and strings; has full internal Unicode
support.  Mathematica has an open architecture for customizing.  All
formatting styles are pairs: screen and paper style.  Uses monospaced
fonts on screen for legibility.

To interpret math, it needs to know unambiguously the role of every
element in the formula.  E.g., is "e" an exponential constant or a
variable named "e".  Mathematica keeps additional information in its
internal representation of the math.

Traditional text book math cannot always be made unambiguous.
Mathematic converts from traditional form notation to its internal
format using a large collection of heuristic rules; this doesn't
always work.  When editing in traditional notation, one can easily
make something that can't be converted back; their editing environment
will prompt about this.  It can handle also handle abstract notation,
though it needs to given processing rules.

Inside Mathematica, everything is a symbolic expression.  Traditional
form math is a set of transformation rules.

Mathematica's spacing and display is much better than TeX's.  It can
export math to gif, eps, pict, TeX, HTML, speech, or to ASCII that can
be reloaded into Mathematica.  Export to/from TeX is done using
transformation rules.  They have a version that takes math input,
sends it to a math server that renders it and returns GIFs; also an
inline app to render math into ActiveX and Netscape inline addins.

Wolfram Research would like to work with SGML Math community.
Mathematica notebooks are are markup language; one could map the
structure of the entire notebook (not just the maths) into an SGML
DTD.

(2) Dave Raggett, WWW Consortium

From other comments later, it appears that the WWW Consortium math
committee is split and that Dave Raggett's ideas are just one proposal
under consideration.

Raggett:

Web vendors not interested in Math

Special review board of Wolfram, Adobe, AMS trying to develop open
spec.

His proposal has nothing to do with SGML at all.  (My question to him
later was, was it fair statement to say that this proposal is to punt,
to take math entirely out of HTML; his answer was yes).

His plan is to use SGML notations and allow documents to specify the
server (URL) where a set of rules (knowlege base) would be available
to render the math.  The actual HTML would be very simple: perhaps
only three tags: inline formula, display formula, and a tag to
identify where the render was to be found.  Then, inside the tag, any
kind of notation could be used.  His demonstration used prolog.

This proposal was not well received by the audience; comments later
included, how many years will it take until this is working; and isn't
this just avoiding the issue?

(3) Roy Pike, Semantic Math

He presented his paper; text is available elsewhere so I won't repeat
it here.  He came down hard in support of semantic math as the
solution to all problems.

Comments from the audience seemed to be that this was a good long term
goal but didn't solve the problem at hand: rendering math for screen
display.

It was apparent that semantic math and Mathematica had a lot in
common; Pike had been to visit Wolfram and has apparently seen their
common interest: a robust semantic math would be a format that
Mathematica could easily export to.

Another important comment (from me and others) was that it is
unrealistic to expect that we are going to be able to afford to
disambiguate math in a production environment; if we can't get this
from the authors then it isn't going to work.

Pike seems to favor working on the assumption of a perfect world;
unfortunately most of us in publishing living in very imperfect
worlds.

(4) Publisher Perspectives: Evan Owens, UCP/AAS

I spoke very briefly on what I see as the important issues:

SGML math won't work unless we have tools to get the math from the
authors and to edit it robustly.  We need authoring and editing
environments or appropriate translation tools.

I also describe our project and how we convert math from LaTeX to SGML
to typesetting systems and back.

(5) Publisher Perspectives: AIP

Tim Inglesby and Scott Johnson of the AIP spoke at some length about
their work with SGML and ISO 12083 math.

(6) Paul Grosso, ArborText

Paul proposed that the semantic layer be applied on top of the current
ISO 12083 DTD through attributes; I asked whether this would be
comparable to the ICADD stuff; he said that he was not specifically
thinking of architectural forms but that was a possibility.

There were some reservations expressed about this from the DLI team in
that their search engine searches on tags not attributes; I pointed
out that one could easily generate special output for searching and
that there were advantages to do so.

Paul talked about how SGML Open had dealt with tables; he proposed
that SGML Open be used as a forum for working out a minimum subset of
math that all vendors would support.

(7) Paul Topping, Design Science

Their product is MathType, an equation editor that is used in MS-Word
and other products, including Corel Ventura.  Their new version in
Unicode based and will support drop in translators.  They are
currently implementing the translator architecture and have not
started working on an ISO 12083 to Mathtype and return translator.

(8) Murray Malone, Panorama

Didn't say much except if you identify the problems, we'll fix them.
Said a lot more later in the discussion.

Part II. Discussions

The scheduled discussion period turned into a real free-for-all; this
summary will reflect the confused state of the discussion.  I've
grouped some of the discussion logically rather than chronologically
to help sort it out.

Paul Grosso (ArborText) was not present for this discussion, alas.

It was clear that we were going to have some major differences of
opinion.  To break the ice, we started with a somewhat less
controversial topic.

(1) Searching Mathematics

It was argued that complex searching of mathematics was comparable to
the kind of searching of chemistry that is done in Chem Abstracts.
That kind of searching depends on knowledge that would only be
available in semantic math (or Mathematica).  This was seen as an
argument in favor of <a href="pike.htm">Roy Pike's proposal</a>.

Side issue: it was argued that it will be unicode and not SGML
entities that will make complex searching possible.  (Mathematica is
already unicode based.)

(2) Legibility of Screen Math

Long discussion of the inherent problems of displaying math or other
complex text on screens.  Mathematica defended their mono-space math
screen fonts.  AIP argued that print quality was necessary and that
Panorama was totally unacceptable.

One of the DLI people had written a dissertation on the screen display
of maths.

It was pointed out that horizontal scrolling is very bad; ditto
zooming in and out; that text that wraps to the width of the screen is
highly desirable.  This entire discussion seem obvious to me; I've
said from day one that if we were designing a journal for screen
reading, it wouldn't have two narrow columns.

This is a major strength of Mathematica, that it can robustly re-break
equations to various screen widths!

(3) Short Term versus Long Term

There was discussion of some immediate solutions such as using
applets: Java, OLE, ActiveX or working out the bugs in Panorama,
versus the long term solution of semantic math.  AIP made the point
that they (and others) have to have a solution now; they have math
intensive journals that need to go on line.

Java Applets apparently don't pass font information back to the viewer
(e.g., baseline) so they don't work well for inline math.  Work on
style sheets for HTML might solve this eventually.

(4) Mathematica's Offer

It was proposed that it would be possible to use the plug-in reader
version of Mathematica to render math.  Mathematicas format is
published but proprietary.  Stephen Wolfram offered to allow an ISO
standard that would closely correspond to their work on the semantics
of math.  Wolfram Research would agree not to sue.

Mathematica claims that it doesn't take a lot of intervention to move
from traditional math to semantic; their heuristic rules help
considerably with this.

Mathematica has taken a journal produced in TeX, Complex Systems, and
converted it entirely to Mathematica notebooks with active math.

At one point in the discussion, Wolfram offered to just the solve the
problem for us, to do the development on a semantic math DTD that
would map in and out of Mathematica's internal format.  The people
from SoftQuad didn't like this at all, as this would effectively cut
them out of the market.

COMMENTARY: of course, that is going to happen anyway as active math
is going to be much more desirable than static math.

(5) Who's doing what with Math

The room was polled as to who is doing what with math of the
organizations present:

- IEE is using embedded TeX
- AMS is entirely TeX
- AAS (us) is SGML math, but AAP DTD
- Beacon Graphics (various projects) uses AAP math DTD
- AIP is using ISO 12083 math post-typesetting



(6) SGML OPEN Model


Murry Malone (SoftQuad) proposed that the model
used for tables by the SGML OPEN be followed: that
the vendors get together and agree on a subset of
functionality that everyone will support.  The
implication was that this would be visual math
with semantic as the optional layer.


(7) DECISIONS AND PLANS


At the end of the discussion it was proposed that
the short term goal be better implementation of
the current ISO 12083 math and that the long term
goal be semantic math.  


(7A) Short Term


It was proposed that the NSF fund another
workshop, this one on ISO 12083 Math
implementation, that we get together and work up
documentation on proper coding practices.  Murray
Malone offered the SGML OPEN summer meeting in
Montreal as a venue for such a meeting.  There is
a risk that such a meeting will degenerate into an
argument with the AIP about how to code ISO 12083
math, since they are the only people doing at the
moment.  But it might be useful.

(7B) Long Term

There was discussion about whether the meeting
should vote to support Pike's proposal.  The
eventually consensus was to not vote one way or
another as the proposal was too new.

(8) Comments and Observations

Odd meeting in various ways:

- strong influence of Wolfram Research (lunch, dinner, and lots of
  bodies)
- lack of preparation/communication between Softquad, the DLI project,
  and AIP
- large turnout shows real interest in topic, but very few people seem
  willing to actually do SGML math

My personal opinion is that the future of SGML math rests entirely on
the tool makers.  Right now, the only serious SGML math (in the USA)
is being done using the AAP DTD because that is what ArborText has
implemented. Without tools to create, edit, and present SGML math no
one is going to bother with it.  AIP even said that they had to write
a tool to convert ISO 12083 math to TeX so that they could render it
and see if they are coding it correctly.  My guess is that Wolfram
will decide that an SGML input output filter isn't any harder than
their existing TeX filter and they will move ahead.  That will shape
the future discussion; as has been amply demonstrated on the internet
"working code wins" and not necessarily international standards.

------------------------------------------------------------------------
Dr. Nico A.F.M. Poppelier
Elsevier Science, APD, ITD               Email: n.poppelier@elsevier.nl.
Molenwerf 1, 1014 AG Amsterdam           Phone: +31-20-4853482.   
The Netherlands                          Fax:   +31-20-4853706.   
------------------------------------------------------------------------
                  The truth, the whole truth, and nothing but the truth.
                                             And maybe some compromises.