Re: HTML3 MATH tag

This is close to the draft of an article.  I hope that I have purged
all of the typographical errors.

I now notice that I overlooked W3C's "Arena".  There are probably
other significant things in the history of math on the web since the
dawn of external applications that I should mention.  If so, please
write me.

It is a comment to the list "www-math" with a copy to the list
"emj", and there are a few blind copies.

------------------------------------------------------------------------------

                              Math on the Web
                                      
                           By William F. Hammond
                                      
   [1]Michael Hamm <msh210@nyu.edu> writes to www-math@w3.org:
   
     Do any browsers (esp. any versions of Mozilla or MSIE) read the
     HTML3 MATH tag and the tags that go in it? Which? Thanks.
     
   In a single word, the answer is no.
   
   HTML 3.0 was a [2]1994 W3C draft that never got beyond draft stage and
   was quickly superseded by [3]HTML 3.2 and, later, [4]HTML 4, which
   contain no provision for mathematics. (Well, one may use "<applet>"
   or, better, "<object>"; but that does not really give mathematics
   fully reasonable access to the web.)
   
   Subsequent to the demise of math-in-html W3C formed an HTML Math
   Working Group whose work led to the creation of MathML, which is now a
   [5]W3C recommendation with principal rendering implementations
   available currently through (1) [6]WebEq applets under mass market
   browsers, (2) the W3C testbed browser (and point-and-click authoring
   tool) [7]Amaya, and maybe (I am not up to date) [8]IBM's TechExplorer.
   I believe that the source code for Amaya is available for those who
   wish to amend it. (For that matter I believe that all of the relevant
   source code of [9]Mozilla, the public version of [10]NetScape is
   available now, too. I believe that WebEq and TechExplorer are
   proprietary with temporary free trials.)
   
   While I understand and accept the reason for the exclusion of the HTML
   3.0 math tags from HTML, we have been left with a situation that still
   presents a serious barrier to the efficient flow of (unstyled)
   content-level mathematical information through the web to robots,
   small-screen displays, audio streams, and Braille streams.
   
   For mathematics on the web, there is a sense in which one can say that
   there has been very little progress in the last 5 years since it
   became possible to have network browsing tools, both under "http" and
   "gopher", quickly spawn external applications based on ``mimetype''.
   
   It is unclear how much improvement will arise as things evolve from
   the dawn of MathML. My guess is that MathML will serve the needs of
   the mathematical, scientific, and engineering communities, while still
   permitting the loss of much of what we understand as ``content'' from
   many resources on the web when that ``content'' is mathematical in
   nature. Of course, provision for these considerations exists in
   MathML. The question is how much attention will be paid to it due to
   the fact that it is more expensive to handle.
   
   For example, I think that it could very well develop to be at least 10
   years before mathematical content can be searched through major web
   indexing and cataloging sites in any remotely robust way, while a
   great deal more would be possible more cheaply if a few additional
   arrangements were made for dealing crudely but faithfully with
   mathematical content in basic HTML.
   
   The arrival of the ``bazaar'' model of development in the [11]Mozilla
   Project gives one hope that this will happen.
   
   The early long term plan, as I have understood it, of the MathML group
   was to rely on the implementation in mass market browsers of the type
   of client-side processing that is associated with [12]eXtensible
   Markup Language (XML), and, in particular, a type of XML that might be
   called ``HTML extended by MathML (presentation tags)''.
   
   The idea of XML is to make up your own HTML. The author or publishing
   house makes up a set of tags. Then he, she, or they work very hard to
   create ``rendering information'' about these tags in a ``style sheet''
   language. A web-served XML document contains a reference to the
   corresponding style sheet, which is also available, under a style
   sheet mimetype, on the web. Browsers are supposed to be able quickly
   to digest the style sheet information and then quickly render the XML
   document. (The style sheet information may already be cached.) This is
   the XML dream.
   
   The first rendering efforts with MathML were applet-based and, I
   believe, early MathML planning envisioned the creation of a mimetype
   for ``HTML extended by MathML'' and the creation of an independent
   rendering application (whether plugin or external) with specific
   knowledge of this markup language. W3C's Amaya appears to have ``HTML
   extended by MathML'' as its default language. (I don't know the
   details of Amaya.)
   
   The "<object>" tag approach to MathML probably is more sensible for
   the long run than ``HTML extended by MathML'' if only because MathML
   is so much more granular than HTML. If I think about type-setting
   MathML, I tend to perceive that task as not any easier than that of
   local direct setting of [13]Geoffrey Tobin's DTL (printable ascii
   equivalent of DVI). The point here is that setting MathML is probably
   too much to ask of native rendering by mass market browsers though it
   is certainly in scale for plugins and external apps.
   
   There is still an issue in the eyes of some, on which I am neutral, of
   whether there is, or will be, a widely used style sheet language that
   is rich enough to provide the desired level of rendering of MathML
   presentation tags.
   
   We need all of the good relevant plugins and external apps that the
   community has the energy to provide. Still, because these make more
   demands on the client side (than do ordinary browsers) -- demands that
   are not reasonable in some places and situations that are and will
   continue to be important -- we need to have a way to handle math on
   the web in formats that are very different from paper or "windowing"
   terminal displays without loss of ``content''. This is possible and
   really not that difficult.
   
   Even if one wishes to set aside the need for audio, Braille, indexing,
   and searching streams, envision, for example, going as a visitor to
   look up something on the web in the San Francisco public library. All
   of the windowing stations are tied up. But you find simple terminal
   (vt100) access to the network via the browser "lynx" at a station that
   is available. It may be that the savvy library administrator has that
   station there because he knows that it will give you a way to avoid
   waiting. (In fact, if its processor is fast, that is almost certainly
   true.)
   
   In ``windowing'' situations it is not too much to ask for the
   ``mathematical typewriter emulation'' (MTE) standard in mass market
   browser native rendering as part of native HTML. MTE is just emulation
   of the mathematical typewriter prevalent in all mathematics
   departments during the period 1960-1980. One had lots of symbols (in a
   fixed font), one could underline, one could move the paper for crude
   cursor positioning, one could make make something bold by re-striking
   after a slight horizontal displacement. It was crude, but it preserved
   content. Photocopy images of MTE documents were widely circulated as
   informal publications.
   
   MTE is more ``in scale'' with ordinary HTML than is MathML, which is
   much closer to fussy typesetting.
   
   All that needs to be added to basic HTML is:
    1. the horde of character entities that we need (in scalable fonts
       with algorithmic styling for bold, emphasis, and perhaps also
       several forms of alternate-emphasis). Algorithmic styling is
       desirable for efficiency even though it is less beautiful than
       separate fonts; but, for that matter, rendered HTML is already
       less beautiful than TeX rendered by "xdvi".
    2. a simple element "<lg> ... </lg>" (logical group) with attributes
       for horizontal and/or vertical cursor motion, described by a
       numerical multiplier relative to the size of the current font,
       prior to the display of the contents of the element and also with
       attributes for horizontal or vertical stretching, again described
       by a numerical multiplier relative to the size of the current
       font. Client rendering support for stretching should be optional.
       Client rendering support for positioning should be mandatory in
       windowed displays and where that is not appropriate the protocol
       should be to replace the opentag "<lg>" by the ascii character "{"
       and the closetag "</lg>" by the ``balancing'' character "}". (An
       attribute of the "lg" tag could be used to change the crude
       rendering strings "{" and "}" to other ordinary string values
       including empty ones. Attributes could also be used to furnish
       hints to computer-algebra systems or to furnish the identity of a
       MathML tag from which the current "lg" was fabricated. So MathML
       could be reconstructed. Of course, all of this would be authored
       in generalized LaTeX. :-))
    3. elements "<math>" (paragraph level) and "displaymath" (block
       level) in which
          + the new "lg" tag is permitted.
          + all character level things are rendered one at a time with
            inter-word spacing except for the case of strictly
            alphanumeric character level things inside "lg" tags
            containing no whitespace, which will be assumed to symbols
            that might be given "\mbox" treatment in LaTeX.
       
   My understanding is that eventually the horde of characters and cursor
   movement will be possible with "w3-mode" in [14]Gnu-Emacs under a
   windowing display. (I do not know about algorithmic styling.)
   
   Inasmuch as there are very few "vt100" terminals extant that are not
   running in displays under local platform windowing systems, it is
   reasonable that the scientific and text-processing communities join in
   an effort to promote a broader collection of characters, cursor
   positioning, and algorithmic styling in enhanced "vt100" terminals.
     _________________________________________________________________
   
   This document was marked up in [15]GELLMU
     _________________________________________________________________
   
   [16]AUTHOR  |  [17]COMMENT   --   Auto-flowed to HTML: Mon Aug 17
   11:03:07 EDT 1998

References

   1. http://pages.nyu.edu/%7Emsh210/
   2. http://www.w3.org/MarkUp/html3/CoverPage.html
   3. http://www.w3.org/TR/REC-html32.html
   4. http://www.w3.org/TR/REC-html40/
   5. http://www.w3.org/Math/
   6. http://www.webeq.com/
   7. http://www.w3.org/Amaya/
   8. http://www.alphaworks.ibm.com/formula/techexplorer
   9. http://www.mozilla.org/
  10. http://www.netscape.com/
  11. http://www.mozilla.org/
  12. http://www.w3.org/XML/
  13. http://www.ee.latrobe.edu.au/%7Egt/tex-soft.html
  14. http://www.gnu.org/
  15. http://math.albany.edu:8000/math/pers/hammond/igl.html
  16. http://math.albany.edu:8000/math/pers/hammond
  17. mailto:hammond@math.albany.edu

------------------------------------------------------------------------------

I would be grateful for corrections and comments.

This text form was auto-flowed from HTML using "lynx -dump".
Other forms of the draft document are available at the URL

        http://www.albany.edu/~hammond/gellmu/webm.html .

                              -- Bill Hammond [17]

Received on Monday, 17 August 1998 12:34:40 UTC