[Prev][Next][Index][Thread]

Semantics, Macros, and the Wolfram Proposal



Ron suggested it would be a good idea for me to summarize how I see
semantic annotations, macros, and related topics fitting into the
Wolfram Proposal for HTML-Math. This letter summarizes my view of
those issues (which also include nonstandard rendering attributes
and extensibility in general) -- in very general terms.

The main reference for the Wolfram Proposal is still my letter of
31 May 96, which is visible at

    http://www.w3.org/pub/WWW/MarkUp/Math/WG/Smith-960531.html

It contains a few errors or out-of-date details, but not (to my
knowledge) in areas related to the topics discussed in this letter.


Introduction

HTML-Math (in the Wolfram Proposal) is primarily a language for
representation of mathematical notation in a form that permits
rendering in various media, possibly according to an individual
viewer's style preferences.

(Translation of an HTML-Math subexpression into a particular external
data format, e.g. for input into a computer algebra system, is
considered to be rendering into a particular medium, which will not be
one of the media used as a direct interface to humans. Thus the
available rendering media will include (among others) visual and aural
notations for human perception, Mathematica and Maple for computer
algebra use, and perhaps other representions of math notation such as
TeX.)

The information represented in HTML-Math consists mainly of expressions
formed by "notational constructors" (layout schema) common in current
mathematical notational practice, such as rows, fractions, subscripts,
etc, used to form compound structures from operators, identifiers,
numbers, and a few other element-types. It allows authors to represent
this structure either directly (using SGML-compatible markup), or via a
predefined operator-precedence syntax, or in some combination of these
ways.

HTML-Math should also provide ways in which authors can represent
various other kinds of information along with these notational
expressions. Furthermore, it should provide some degree of
author-extensibility of the source language, for various reasons. These
issues are related, because use of author extensions is one way of
conveying extra information, and because the ways in which author
extensions are defined for potential use resemble the ways in which
other kinds of extra information are given, so I'll discuss them both
in this letter.


Annotations

"Annotation", here, means the inclusion by a document's author of extra
information, other than that needed for rendering to human perceptual
media, which applies to a particular expression-constructor,
subexpression, or larger portion of a document. (It should not be
confused with the concept, also called "annotation" in discussions
about the WWW, of the separate publishing of comments about portions of
documents, which can be viewed automatically along with the document,
without the document author's cooperation.)

This extra information might include semantic annotations (such as the
mathematical types of expressions) or nonstandard rendering attributes
(such as whether subscripts and superscripts should be displayed in
"chemistry" mode). The meaning or intended effect of annotations is not
in general defined by HTML-Math, though it may be in a few cases. In
any case, the goal of HTML-Math is only to pass this information on to
renderers, which are free to interpret it as they wish, but should
ignore it if they don't understand it (that is, if they don't recognize
the attribute names used to label it). It is up to authors, viewers,
and other conventions or standards to cause any correlation that might
exist between how these annotations are interpreted by the author and
by the viewers of a given document.

(By "attribute names" above, I don't necessarily mean just the names of
SGML element attributes, since a DTD must specify a fixed set of these,
but HTML-Math can't know ahead of time the full set of (for example)
nonstandard rendering attributes. Therefore, though some specific
annotation attributes may be defined by HTML-Math and then may be
represented by SGML attributes, there must be another syntax provided
for specifying nonstandard attributes in general. Exactly which syntax
should be used (of the many obvious possibilities) is still subject to
discussion.)

HTML-Math should probably provide several ways of specifying
annotations, including SGML attributes for particular annotating
attributes and/or general kinds of annotations, and/or one or more SGML
elements for adding certain kinds of annotations. I'll try to add a
reasonable set of ways to the next draft of the Wolfram Proposal. I
would appreciate suggestions of specific annotation syntaxes that
members of the ERB would like to see added to the proposal. (Whatever
syntaxes are used need to have the property that an annotated
subexpression is still a single subexpression according to the parsed
syntax.)

Note that there is no essential difference to HTML-Math between
semantic annotations and nonstandard rendering attributes -- in both
cases, it is up to a renderer to decide whether and how to use the
information. Since translation into a CAS (or into a representation of
mathematical meaning, such as OpenMath or Roy Pike's proposed extension
to ISO12083) is considered to be rendering into a special medium,
semantic attributes which affect this translation are simply rendering
attributes applicable to these media.


Macros

Another way in which authors want to provide extra information is by
defining and using "macros", which allow them to express information in
a "higher-level" form than is normally necessary to render it. A macro
facility can be useful in several ways (which have been discussed
before in this group, most recently in Bob Sutor's letter about a
\Vector macro in TeX):

- it allows authors the convenience of using abbreviations for
commonly-used expressions, or patterns (forms) of expressions, which
are unpleasantly long to type otherwise (as they might be, for example,
if they contain a lot of semantic annotations);

- it allows authors to ensure the consistency of representation of
similar things, e.g. identifiers for vectors;

- it allows the way in which similar things are represented to be
changed all at once by making a single change to a document;

- it gives authors a useful way of expressing high-level information,
such as semantic connotations (e.g. "this variable represents a
vector");

- assuming the macro expansion rules can be overridden by viewers,
perhaps differently for different media, and assuming a document makes
use of some high-level macros from some collection known to both author
and viewer, it allows viewers to make use of that high-level (e.g.
semantic) information by providing customized renderings for specific
kinds of high-level objects (i.e. for specific macros) (e.g. a viewer,
not just an author, can decide to change the rendering of all the
vector variables in a document).


The Wolfram Proposal for HTML-Math, as described in the URL mentioned
above, already includes a fixed set of "transformation rules" (some of
which are shown and explicated in that document) which are part of the
means by which it translates HTML-Math source text into a "presentation
tree" (thanks to Patrick Ion for this new, better term for what was
called a "display list" in that document). These rules operate on an
expression tree generated by parsing of operators and SGML markup --
they never operate directly on source text. Please refer to that
document for an overview of how these rules work, and for a way of
representing them which would be usable in macro definitions.

(The process of applying the transformation rules has also been
referred to as "pattern-matching" or "template-matching".)

It was always our intention to amend the proposal to allow this set of
rules to be author-extensible (and to call the ones added by authors
"macro expansion rules"), and thus to allow authors to define macros.
Authors would do this by choosing any form they wished for a new
macro's invocations to take, preferably a form which would not
otherwise be used in their documents, and defining a new rule which
would recognize this form and turn it into another form for rendering
(possibly after further macro-expansion of the new form).

The advantages of using general transformation rules as macro
definitions, rather than requiring all macro invocations to have a
standard appearance (such as a macro name followed by a parenthesized
list of arguments) include:

- authors can use whatever form seems natural for what they want to
express, e.g. a special operator (assuming they can extend the set of
operators; see below); the extensions can be as syntactically general
as the native language;

- authors (or viewers) can define expansions for constructs which were
originally being rendered directly (or at least by means of built-in
rules) rather than via macros;

- certain built-in constructs (like the subscript and superscript
operators) can be defined using the same mechanism.

Authors who want all their macro invocations to have a standard appearance
are of course free to use only macros which are defined in a uniform way.


Other forms of extensibility

Other things in the Wolfram Proposal that should be extensible include
not only its transformation rules, but its dictionary of character and
operator properties, the set of layout schema allowed (though the
nonstandard ones can't have individual SGML tags), and perhaps even the
set of extended character entity names that can be used (this is
controversial, I assume, since author extensions to this would
invalidate any HTML-Math DTD, which has to list a fixed finite set of
entity names).

Whatever the things that can be extended, the methods of extension
ought to allow:

- making an extension for all of a document, or any part which consists
of a single SGML element (perhaps requiring the addition of a special
type of element to designate the part to be extended);

- giving extensions directly, and/or giving a URL which points to a
document in which some extensions are defined;

- making extensions incrementally, or by replacement of entire portions
of the context (e.g. the entire operator dictionary);

- some extensions which are not overridable by viewer preferences (e.g.
to operator precedence, and macro definitions used only as
abbreviations), and some which are (e.g., macro definitions used to
provide new high-level constructs).

All details of these facilities are subject to discussion. Unlike the
annotations facility discussed above, the design decisions here are not
just making choices among many obvious possibilities (all essentially
equivalent), but rather (IMHO) involve some subtle issues (such as the
precise order in which rules should be applied, and re-applied to
already-expanded parts of an expression) in which the various choices
are not equivalent and the best choice is not obvious. (There is also a
large set of decisions which are not so subtle but still affect the
scope of the kinds of macros that can be defined, such as how general
are the facilities for pattern recognition.)


Extended characters with semantic connotations

One aspect of our proposal not discussed above is that it allows the
choice by an author of one of several extended characters, which
typically render almost the same, but which carry different semantic
connotations (e.g. if a different extended character is provided to
mean the mathematical constant, or a variable, named by the greek
letter Pi). This has some similarity to semantic annotation, and some
similarity to the choice among several macros which render the same,
but is not quite the same as either of these. We propose to build in
several sets of similar extended characters of this nature. (The
complete list will be provided with the complete character dictionary.)
Whether authors can define new such characters is an important issue,
which was mentioned above.


-- Bruce



Follow-Ups: