Extensibility: a solution

From: Ka-Ping Yee <s-ping@orange.cv.tottori-u.ac.jp> Date: Tue, 09 Jul 1996 21:22:30 +0900 Message-ID: <31E24F06.53A342C3@sse.tottori-u.ac.jp> To: W3C Math ERB <w3c-math-erb@w3.org> · This archive was generated by hypermail 2.4.0 : Saturday, 15 April 2023 17:19:57 UTC

In his message of Monday 8 July, Bruce Smith asserts:
>
> I think the properties I named above absolutely guarantee
> that a transformation rule based scheme can work well...

Certainly it is good to know that a proposal does not prevent
future extensibility methods.  But this might not be enough.

--------------------------------------------------------------- problem

Firstly, without a design that is sufficiently general from the
start, the attachment of extensions later will sharply divide
what is "internal" from what is "external".  "Native" constructs
and "foreign" constructs will no doubt behave differently and be
implemented differently, increasing the complexity of the system
unnecessarily.

    It is likely that some users will be stranded with a limited
system because their particular setup doesn't support a given
"extension", and so this will likewise separate "native"
documents from "foreign" documents which require special
treatment.  This arbitrary barrier leaves plentiful
opportunities for dispute and frustration ("why did they put
group theory in the standard but not topology?" etc.).

    Secondly, even if the design is limited to "basics" to begin
with, it's clear that the "basics" are actually quite many.
Attempting to package these all in at once risks bloating the
native implementation with large databases of names and their
associated rendering code.  We are forced to trade off between
groups of unsatisfied mathematicians and a cumbersome product.

    Thirdly, the very decision of what goes in and what stays out
could substantially delay the production of the standard -- even
more than it would take to develop an extensibility method from
scratch.  (But, better yet, we *aren't* starting from scratch,
because i've already got one.)

-------------------------------------------------------------- solution

My vision for MINSE is to have both context information and
rendering instructions downloadable from the Web, which solves
the above three problems.  I aim for minimalism and orthogonality.

    There is no distinction between "native" and "foreign" things;
all constructions are flexibly specified in "style definitions"
which are grouped according to the target medium.

Bruce Smith challenges:
>
> Again, I suggest that the burden of proof be on those who say
> this is sufficiently easy to do -- propose, if you will, a
> fully specified design for a system of author-extensibility...

Here is the summary, as promised; the following is a simplified
description of how expressions are processed and rendered.  A
more detailed description of the design has been on the web site
for a while, so i apologize to those of you who have read the
material there and for whom the following is redundant.

All semantic constructions are represented as named "compounds"
containing "sub-elements" (which may themselves be compounds).
Fundamental elements are either "identifiers", "numbers", or
"names" [SYNTAX].

Processing happens in two stages [DESIGN]: from the expression
to the semantic tree, and then from the tree to the rendering.

1.  The expression as entered is considered to be a fairly
direct representation of the parse tree, except for two kinds of
abbreviations: operators (unary, infix, serial, grouping, and
application), and named macros.  Operators associate their
arguments and macros expand into trees during this first stage.
This stage uses auxiliary information in a "notation definition"
-- downloadable from the Web -- which declares the precedence
levels, names, and associativity of operators.  Macros may be
declared either in the notation definition or in a document.

2.  The second stage uses auxiliary information in "style
definitions" -- also downloadable from the Web -- consisting of
a number of methods defined in Python which are grouped into
classes according to the target media type.  The job of these
methods is to take a tree and "transform" it into the
media-specific data which constitute the rendering.  The
semantic tree is stored as a Python tuple object.  It is passed
to the method associated with its root compound, which calls
other methods to transform its sub-elements and combines the
results in an appropriate manner.  Finally a method to
post-process the final result is invoked, if one is defined.

--------------------------------------------------------------- addenda

Some auxiliary points:

---------------------------------------------- transformation rules ---
- In response to my earlier message Bruce Smith asked a number
  of questions pertaining to rule-based expansion.  As i wrote
  in my earlier message entitled "on parsing", there is no
  rule-based expansion.  For the first stage, the author must
  explicitly distinguish macros or compounds, thus avoiding
  any ambiguity.  For the second stage, the transformation is
  well-defined by the style definition.

  Though at first it appeared that rule-based expansion had
  advantages because the syntax can read more like English,
  i think it presents a distinct obstacle to extensibility.
  As the number of rules grows without bound, the chances of
  confusion increase because there is no way to explicitly
  denote the parse tree.  Moreover, the ordering of rules is
  crucial and i think that having many of them could lead to
  some surprises.  So doing all the transformation using only
  matching rules seems to put an upper bound on extensibility.

  On the other hand, it isn't at all clear to me that such
  transformation rules are sufficient to provide all the
  presentational control that people may want.  To provide
  a concrete example, right now the MINSE style definitions
  display grouping by alternating parentheses and square
  brackets with increasing depth.  You need Python for this.

  Transformation rules seem too powerful for the notation side
  and not powerful enough for the style side.

----------------------------------------------- context definitions ---
- After the first stage is complete we have an internal
  representation of the expression that carries all of its
  meaning.  The meaning is obtainable from the chosen "context
  definition" (for example, as descriptive strings in the
  prevalent language).

- The meanings given in the context definition should make it
  possible for a user reading the expression to select elements
  in the expression and find out what they mean.  Not only might
  this be very valuable for a learning environment, but it also
  helps ensure accurate communication under all circumstances.

- The context definition is also consulted before calling any
  of the style-class methods to ensure that the argument counts
  are correct and that the compound names are known.  Only the
  context definition may define exceptions; style definitions do
  not.  Each media class is given its own methods for handling
  exceptions and producing indications in the appropriate form.

- All definition files can be overlaid on others, using the
  mechanism of class inheritance.  Thus, much as with CSS, a
  user can import a standard definition and then add or change
  a few things to suit the circumstances.  (Styles, contexts,
  and notations don't all have to reside in separate files.)

-------------------------------------------------------- deployment ---
- The syntax is separate from HTML, and can be deployed without
  the introduction of any new HTML elements (via the new OBJECT
  element).  However, for practical reasons, ONE tag is proposed
  for including expressions: <SE>.  I plan to take advantage of
  the existing LINK element to specify relationships to style,
  context, or notation definition files.

------------------------------------------------------ macro syntax ---
- A macro represents a tree, not a string; when a macro is
  defined, the name is bound to the semantic tree that results
  from parsing the given expression.  I'm considering a notation
  like the following, where "@" followed by a name invokes a
  macro and "@" followed by an integer substitutes an argument:

  To declare a macro you might say

      <se declare="cuberoot">'root(@1,3)</se>

  To invoke a macro you might say

      <se>@cuberoot(x+4)</se>

  Macro expansion is never delayed.  This scheme avoids
  recursion and ensures that macros will be error-free.

-------------------------------------------------- Python rationale ---
- Python was selected because it gives the necessary descriptive
  power and flexibility to produce renderings from the semantic
  tree, while providing sufficient security for downloaded code.
  Python also provides high-level data types like dictionaries,
  tuples, modules, and classes, as well as exception handling.

  Using a feature of Python, all execution of downloaded code
  can take place in a completely separate scope.  This both
  ensures that the processor (also in Python) is safe, and lets
  us slash away most of the built-in functions, leaving only
  basic things like string and tuple manipulation.

------------------------------------------------------------ references

[DESIGN] http://www.lfw.org/math/design.html
[SYNTAX] http://www.lfw.org/math/syntax.html

Well, there you have it.  Thanks for reading this far, if you did.

Ping