Re: semantic markup for math

Daniel W. Connolly writes:
 > On the other hand, LaTeX is no longer state of the art. There
 > has been significant research and development since then.
 > For example:
 > 
 > >  Given that eqn could be rendered approximately as ASCII, I
 > >suspect that LaTeX style formulas can be as well.
 > 
 > Nope. At least not according to Soiffer's research (see
 > http://www.w3.org/team/WWW/MarkUp/Math/ for full citation).
 > He shows that you need the tree-structured representation of
 > a formula to do line-breaking. And I think automated linebreaking
 > is necessary in order to render formulas to ASCII -- or to
 > a resizable window, for that matter.

Yes, depending on how you want to break the formulas, that's clearly
true (I can't get at his paper right now to read it, since my W3C
member password seems to have expired and my IP address hasn't been
registered yet by W3C...).

Several thoughts, though:

 -- Do we really need linebreaks at all?  Some constructs just
    can't be broken across lines no matter what you do.
    Horizontal scrolling or simple vertical cutting may be
    acceptable, if less than pretty.

 -- Do we need semantically meaningful linebreaks?  Many languages
    other than English simply break a line wherever it may end;
    the same might be acceptable for rendering math in ASCII.

 -- Do we really need _automated_ line breaks for formulas?
    Browsers already can't break many things automatically,
    and a user could help with linebreaking by adding either
    markup for possible linebreaks, or by adding markup for
    groups that should not be broken.

 -- Do we really need a complete tree representation of
    a formula just to find one plausible line break?

I think the last thoughts lead to a possible conservative approach: by
adding optional grouping and/or divisional constructs to something
like HTML 3.0 mathematical markup, authors would get a choice of how
much effort to put into helping the browser render under unusually
tight geometric constraints.  On the other hand, existing content
could be translated into a usable form automatically.

Another, complementary, approach is to provide a bit more structural
markup than commonly used, like an easy-to-use invisible
multiplication operator, and (more) optional infix and prefix variants
of common operators.  This could probably create enough opportunities
for safe linebreaks in most formulas, with the other
grouping/divisional constructs letting the author fix up tricky cases,
a situation not unlike it exists for textual linebreaking right now.

Approaching structural markup "from the bottom" in this way, building
on existing practice, seems to me like a more promising approach than
a radical shift towards structural markup for everything with a
completely novel markup system.

I should note that I do most of my browsing using an ASCII browser
(lynx) and that I care very much about the Web staying a medium that
is accessible by text-based means and accessible to automatic indexers
and other "robots".  But I'm also concerned with getting a standard
for this relatively soon and in a form that is straightforward enough
that people will actually implement and use it and that existing
on-line and off-line documents can be converted to it with minimal
skill and effort.  I don't think those two goals have to be
incompatible.

You sounded like there might be other concerns; are those written
up in some publically accessible place somewhere?

Thomas.

Received on Friday, 19 July 1996 05:20:52 UTC