Re: html markup of previous sample

I won't have time to fully peruse Ron's sample HTML-Markup, or Ping's
comments on it, for at least the next few days (and then I'll be out of
email touch for a week or two), but I thought I should mention a couple
of points that caught my attention when I skimmed Ping's response.
(I have not yet read all the later letters which may also touch on
some of these issues.)

(I apologize that it's been taking me so long lately to read email on
this list. I have been drawn away by other urgent work. I hope that
I'll be able to correct this problem soon.)

First I want to mention again an important point to avoid some possible
misunderstandings: the letter outlining the main parts of my HTML-Math
proposal for Wolfram,


is intended to supersede all prior letters from me or anyone at Wolfram
in the mailing list; also, our proposal does not necessarily include
elements discussed by others in the list unless that letter (or some
subsequent statement from me) specifically says it does. It seems to me
that some of these prior letters may be the cause of some
misunderstandings. For example, there is nothing in our proposal about
transformation rules which try to support "English-like syntax" or
which work at any other stage than after the parser, and on the
expression tree generated by the parser. The only kind of transformation
rules we propose are the kind discussed in that letter.)

Also I should say that I've finally found time to read most of Ping's
web pages on MINSE, so now I can understand his comparisons with it. I
have to say that it is an impressive and well-described system. It does
differ in several important ways from our (Wolfram's) proposal, which
I'll address mainly when I reply to the letters concerning those
points; for now I should mention the most important distinction, which
is that his system is primarily "semantic" and ours is "notational",
which means (among other things) that the information our systems are
each trying to represent is quite different. Many of Ping's specific
points of comparison are manifestations of this general difference (as
I'm sure he understands).

At 11:16 AM 7/8/96, Ka-Ping Yee wrote [excerpted]:

>4.  In many situations multiple comparisons are written in a chain,
>    which happens here under the "max" compounds with "0 <= i <= T".
>    How does your notation deal with this and the issue of operator
>    associativity?

This is described in my letter; here are the excerpts which pertain
to the case of relational operators:

        The parser groups a term with the adjacent operator which has the higher
        precedence (assuming it is being used in a form which takes an operand
        on that side). If these precedences are equal, it groups the term with
        <em>both</em> operators;
        The same feature of grouping a term with both adjacent operators is used
        to allow certain operators to have "flat" or "n-ary" associativity,
        e.g. + and &amp;InvisibleTimes;. This is what causes the source text "4ac"
        (in the example given far above) to parse to a single (mterm ...)
        subexpression containing three subterms (which are mn and mi tokens for
        4, a, and c) separated by two (invisible) operator tokens.
        Sometimes, more than one operator has the same left and right
        precedence; this is true, for example, of relational operators, so that
        sequences of inequalities turn into single subexpressions even when
        (e.g.) both < and <= (or &amp;LessEqual;) are used in the same sequence.

Thus, according to the above (and to the proposal), "0 <= i <= T" parses to

            (mn "0")
            (mo "<=")
            (mi "i")
            (mo "<=")
            (mi "T")

I'm sorry if this was not sufficiently clear from the proposal letter.
I should probably add a specific example involving relational operator chains,
since they are perceived as different from the examples I gave of the same
parser feature, which were matching brackets and n-ary operators.
(Of course, they are *semantically* different, but not in a way which
our proposal, which is mainly notational, attempts to capture.)

>9.  Ron marked up "absolute value" using &leftvert; and &rightvert;
>    (Ron's point 7).  How is the grouping ability of these symbols
>    declared? ....

By declaring (in the operator dictionary) these special characters
to be (one-character-long) left and right bracket operators.

>13. Parens are also used for all sorts of meanings in this example
>    (Ron's point 6 in the TeX posting), and i think it's impossible
>    to tell the difference between the interval "(0,&infinity;)" and
>    the pair "(&nu;_1,&nu;_2)" the way Ron has it marked up.  It is
>    also very unclear when the parens indicate function application,
>    as in "&nu;^&epsilon;(x,t)".
>    This is all distinguished in the MINSE markup using different
>    compounds.  Function application is the only case implied just
>    by the parentheses; the compound "openopen" is used for writing
>    an interval open at both ends, alleviating this ambiguity.

This is an example of the general difference between a notation system,
like ours, and a semantic one, like MINCE. The notational system
doesn't attempt to distinguish between the various meanings of the same
operators or identifiers, except in a few important cases which
normally affect the rendering. This has the advantages that the
system's designers (or the authors of "contexts") need not make a list
of all concepts to be discussed, and the authors need not look up the
correct named concept to use; but it has the disadvantages that the
renderer can't choose to render different concepts with the same
conventional notation differently, nor is the very valuable semantic
information represented (in an easily or unambiguously extractable

There has been quite a bit of discussion of the relative merit of each
of these approaches, and the present consensus of the HTML-Math group
is that the notational approach is better for HTML-Math. We do,
however, hope to get the "best of both worlds" to some degree,
eventually, by giving the proposal author-extensibility so that authors
have the *option* of defining and/or using constructs which carry
additional (possibly semantic) information. (I'll address the issue of
why I'm sure we can do that well enough when I reply to Ping's letter
about Extensibility. I hope to have time to do that before I go out of
touch for 1-2 weeks, but I'm not sure whether I actually will.)

>------------------------------------------------------------- discussion
>On the whole, i think i'd have to say that the proliferation of
>homonyms in Ron's example makes me rather uncomfortable.  Parens,
>superscripts, and juxtaposition have so many different meanings
>in the HTML markup he posted that -- even if it were possible for
>mapping rules to choose which meaning is intended -- i don't think
>i would just trust the rules to pick the right one every time, and
>guessing exactly how to appease them by manipulating the notation
>would quickly get troublesome.  I would much prefer getting into
>the habit of consistently saying what i mean instead of hoping
>that it gets interpreted right.

Of course, per my comments above, in our proposal for HTML-Math, no
attempt is made to automatically disambiguate these homonyms in an
HTML-Math renderer. If a CAS wants to try to do that when it reads the
HTML-Math, that is up to it. (And when we add author-specified
contexts, that will be in large part to make it possible for authors to
make this job easier for the CAS. But we won't require them to,
in contrast to MINSE or to Roy Pike's proposal.)

>Moreover, what if authors later want to define new meanings for
>juxtaposition or parentheses?  There seems to be no provision for
>this because the juxtaposition itself is used to figure out the

Again, the meaning, in general, is never figured out at all.

It's true that we make a few exceptions to this, e.g. in deciding
whether an implied infix operator should be a times or a function
application, because that so commonly affects the rendering, but it is
not hard for authors to always insert these operators explicitly if
they want to override this automatic decision. (And when our proposal
is made extensible, it will be possible for authors to "change the
rules" for this; but that's beyond the scope of this letter.)

>I think it makes more sense to go the other way, i.e.
>from the meaning to the notation instead of guessing the meaning
>from things like juxtaposition and parentheses.

But it is also ambiguous to go from meaning to notation, as well as from
notation to meaning -- there are many possible notations for the same
meaning. Assuming that authors want to influence the notation chosen,
this is another advantage of a notational system. (Of course, I admit
that a semantic system can include some notational information, just as
our notational system includes a bit of semantic information; and I
also admit that authors want to influence not only the notation used,
but the meaning inferred -- at least I hope they do :-). In other
words, I don't claim (or believe) that "our approach is good and the
other is bad", but rather I think that which one to take, and to what
extent, is a matter of judgement about the best tradeoff of various
factors, given the uses that a representation is aimed towards.)

- Bruce

P.S. I think a couple of the group members (Neil and Dave) might recall
that when I first joined this group, I was strongly in favor of an
approach which, though notational, was much more like MINSE that our
current one, in which there were named compounds for each "notational
operation" (such as surrounding an expression with parentheses). In
that system, a notation like [0,1) would need its own compound, just
like in MINSE. Neil very eloquently converted me away from that
approach by patiently explaining some of its disadvantages, like the
difficulty of representing nonstandard notations or syntactically-
incorrect expressions, and the need to invent an endless stream of new
notations (names of compounds) for things which already have
universally recognized standard notations (albeit sometimes-ambiguous
ones). Also in his favor was the very elegant (and to me, very
surprising) way in which he had been able to get Mathematica to
parse integrals, where the integral sign and the differential-d
are operators with specific (carefully-chosen) precedences like
any other operator (which has been preserved in our proposal for