Semantics, Macros, etc


I have finally finished catching up on the various posting here, and
thought I should throw in my two cents worth.  

I notice that a number of questions have come up about the
"presentation list" format in the course of going through Ron's
examples.  However, in light of the ongoing dicussion about the
balance between notational and semantic markup, it seems I should try
to clarify the role of the display list spec that Neil and I worked
on in that context first.

It seems to me that in order to move from a general philosophical
consensus about capturing semantic information to a practical language
design, we need to formulate and prioritize what kinds of authors we
want to use HTML Math, and the media (in Bruce's sense) they can
render to.

The idea behind the display list specification was simply to try to
formulate the requirements of one particular medium -- high quality
visual rendering.  The "display list" partially described in Bruce's
May letter didn't contain enough information to do high quality
rendering.  In the course of writing a renderer, I had to introduce
various new attributes and schema, so I tried to formalize them and
write them down in order to insure HTML Math ultimately contained
provisions for passing this information to the renderer.

Based on more recent comments and discussions, I now think that a good
deal of the information required by the visual renderer has roughly
the same status as, for example, expression type information for a
CAS.  It should probably be passed as some form of annotation,
specific to visual rendering, and ignored by everything else.

A large part of the reason for spelling out a display list spec was
just so I had something to test the renderer with, and that was still
relatively compatible with the Wofram proposal.  Neil and I gave an
SGML style description of it because we could, and because there had
been some discussion of making notational markup directly available to
the author.  However, how much of that stays internal and how much is
available to the author depends on the priorities we choose -- making
it available will weight HTML Math much more heavily toward notation
than semantics.

Obviously, various committee members have different priorities about
authors and media.  For example, the Wolfram contingent is more
concerned with users rendering to CAS, while Ping is more committed to
relatively sophisticated users who are willing to build compounds, and
use relatively non-traditional notation, in exchange for flexibility
and ease of rendering to various media.  Ron, on the other hand, has
got to keep the interests of the hardcore TeX-using research math
community in mind, etc.

Though they are far from set in stone, my personal views go something
like this:

Authoring and Rendering Priorities

1. Secondary and undergraduate level math notation with 
   little or no semantic information for generic visual display

2. Secondary and undergraduate level math notation with 
   enough semantics for CAS rendering.  To me, it is reasonable
   to expect the author to have a specific CAS in mind.

3. Research mathematics notation with little or no semantics
   for visual display.

4. Research mathematics notation with enough semantics for CAS, math
   software, or speech rendering.  Again, I would be content with a
   document tagged specifically for Maple or Geomview, etc.  However, 
   speech rendering would need to be possible at the same time.

The main consequence of these priorities is that I think HTML Math
should be primarily notation based, with semantic information added
through some combination of annotations and parser heuristics.  Thus,
I prefer write something like

	&interpretation{conjugation} &bar; z 

to something like

	&rendering{overbar} &conj; z

My feeling is that the majority of authors in category 1. want
notational markup, and would have trouble with a heavily semantics
based system such as Ping's.  I also think that the number of
occasions that even sophisticated users will really want or need to
include semantic information is relatively small.  

A second consequence is that heavy users of macros and annotations are
well down on the list, and that people who are likely to get
themselves into trouble with an extremely extensible system are
further up.  Thus my preference is that HTML Math be relatively
complete and terse for users in category 1.  I would hope that users
in this category could get by with defining relatively few macros, and
these could be of the simple "abbreviation with arguments" type.  

I don't think it is unreasonable if it is more difficult to define new
"first class citizen" entities and schema, alter operator
dictionaries, or change the substitution rules that create the
presentation list.  It obviously needs to be possible, but it need not
be commonplace.