[Prev][Next][Index][Thread]

Re: Progress on Parsing



Neil writes:

> It would be useful for those of us who don't have prolog on our
> machines if you would include the sample output along with your
> sample input.

Okay I will include some in the comments.  You should be able
to download a free Dec10 style prolog for most platforms though.
Note that I expect to switch to C or Java for the final version
of the reference code. Prolog is a great prototyping tool and
makes it practical to write parsers in a few lines of code that
are relatively declarative.

> I am confused how you deal with parens and functions.
> I didn't see any definitions for them

parens are treated as operators although { and } override
other kinds of parens.  I haven't treated < and > as parens though
as these seem more appropriate as infix operators.

The tokenizer maps "(" to the term left:'(' and "]" to the term
right:']'  etc.

Functions are handled by mapping say sin(x) to sin.(x)
or "sin x"  to "sin.x"  Where the implied "." operator
is left associative like "+". I haven't got further than
this, e.g. on how to distinguish polynomial expressions
from functions. We discussed this recently by phone and I
will tryout the rule you suggested.

> 'integral' also seems to be missing

This is currently handled as a literal rather than as an operator.
The next step will be to combine top-down parsing based on templates
with bottom-up parsing using operator precedence analysis to avoid
inappropriate bindings: "+ integral of" current reduces the "+" first
which is wrong. The template would force this to be reduced to the
right overriding the normal precedence values.

When I thought about treating integral as a prefix operator, I got
stuck with how to handle from and to. If these were prefix operators
you then end up with:

      {{{integral {from 0} . {to n}} of {sin.x}} wrt x}

This works I guess, but feels wrong to me. I think the use of
templates to influence parsing makes more sense, as it breaks
out of the limitations of using operator precedence alone.

Treating + and - as n-ary rather than binary operators is an
interesting idea, but can perhaps be handled via the code that
exports the expressions to say Mathematica or Maple.

I have included operators for limits, subscripts and superscripts.
Its interesting to note that limits are left associative while
scripts are right associative based on the interpretation of
superscripts as powers, e.g.

   y = x^y^z    is  y = {x ^ {y ^ z}}

For limits on integrals and presumably sum and prod etc. I have
used an infix operator "of" to bind the integral sign to the
integrand, e.g.

   {{{integral from 0} to n} of  {sin.(x)}} wrt x

I couldn't work out how to make dx and dy etc. bind to the appropriate
integral for a double integral. The "wrt" operator sidesteps this
problem though.  The use of templates to guide parsing should fix
this problem.  

Dave