[Prev][Next][Index][Thread]

Lexical Syntax for HTML Math



During today's phone conference (13th May), Neil said something that
I wanted to check with the group:

After the recent SGML Math meeting in Champaign we ended up chatting
in Neil's office. I raised some alternative models for lexical
syntax and we discussed them with Steve Hunt and the guys from the
AMS. The approaches proposed as far as I remember were:

     a)   y=ax+b      \integral \from 0 \to n \of \sin ax \d x

Here "ax" is tokenized as two tokens "a" and "b". backslash is
needed before multicharacter name tokens. A non-name character is
needed after such tokens (e.g. space, backslash or other 
non-alphanumeric character). This allows names to be identified
as a complete strings without prior knowledge, and avoids potential
errors, e.g. in interpreting "\sinax" vs "\sin ax". The backslash
mechanism is used for all special symbols or functions.

     b)   y=ax+b      &integral; &from;0&to;n&of;&sin;axⅆx

Here special characters such as the differential "d", the symbol "e"
and non-ascii characters like greek letters and other symbols and
functions are expressed as SGML named or numeric character entities
(Unicode character numbers are assumed for the latter). This is
similar to existing SGML notations except that it uses entities as
operators rather than relying on tags.

     c)   y= a x + b     integral from 0 to n of sin a x \d x

This avoids the need for special escapes although the backslash
mechanism can be used (\d is distinct from "d" and treated as
the differential d). In our discussions, this was I believe the
preferred lexical scheme, and corresponds to the one I implemented.
In most cases it is the shortest, and easiest to read. Scheme (b)
is the least legible, so scheme (a) is my second choice.
I think Neil suggested a blend of (a) and (b) but by providing
two alternative escaping mechanisms seems open to trouble.

I have made significant progress with an augmented operator
precedence parser in dealing with expressions like in (c). The
meta-language consists of operator declarations plus rules for
matching brackets and other groupings such as the integral sign
with the \d symbol. I look forward to Bruce's suggested grammar
as a means for testing these ideas further.