Ron suggested it would be a good idea for me to summarize how I see semantic annotations, macros, and related topics fitting into the Wolfram Proposal for HTML-Math. This letter summarizes my view of those issues (which also include nonstandard rendering attributes and extensibility in general) -- in very general terms. The main reference for the Wolfram Proposal is still my letter of 31 May 96, which is visible at http://www.w3.org/pub/WWW/MarkUp/Math/WG/Smith-960531.html It contains a few errors or out-of-date details, but not (to my knowledge) in areas related to the topics discussed in this letter. Introduction HTML-Math (in the Wolfram Proposal) is primarily a language for representation of mathematical notation in a form that permits rendering in various media, possibly according to an individual viewer's style preferences. (Translation of an HTML-Math subexpression into a particular external data format, e.g. for input into a computer algebra system, is considered to be rendering into a particular medium, which will not be one of the media used as a direct interface to humans. Thus the available rendering media will include (among others) visual and aural notations for human perception, Mathematica and Maple for computer algebra use, and perhaps other representions of math notation such as TeX.) The information represented in HTML-Math consists mainly of expressions formed by "notational constructors" (layout schema) common in current mathematical notational practice, such as rows, fractions, subscripts, etc, used to form compound structures from operators, identifiers, numbers, and a few other element-types. It allows authors to represent this structure either directly (using SGML-compatible markup), or via a predefined operator-precedence syntax, or in some combination of these ways. HTML-Math should also provide ways in which authors can represent various other kinds of information along with these notational expressions. Furthermore, it should provide some degree of author-extensibility of the source language, for various reasons. These issues are related, because use of author extensions is one way of conveying extra information, and because the ways in which author extensions are defined for potential use resemble the ways in which other kinds of extra information are given, so I'll discuss them both in this letter. Annotations "Annotation", here, means the inclusion by a document's author of extra information, other than that needed for rendering to human perceptual media, which applies to a particular expression-constructor, subexpression, or larger portion of a document. (It should not be confused with the concept, also called "annotation" in discussions about the WWW, of the separate publishing of comments about portions of documents, which can be viewed automatically along with the document, without the document author's cooperation.) This extra information might include semantic annotations (such as the mathematical types of expressions) or nonstandard rendering attributes (such as whether subscripts and superscripts should be displayed in "chemistry" mode). The meaning or intended effect of annotations is not in general defined by HTML-Math, though it may be in a few cases. In any case, the goal of HTML-Math is only to pass this information on to renderers, which are free to interpret it as they wish, but should ignore it if they don't understand it (that is, if they don't recognize the attribute names used to label it). It is up to authors, viewers, and other conventions or standards to cause any correlation that might exist between how these annotations are interpreted by the author and by the viewers of a given document. (By "attribute names" above, I don't necessarily mean just the names of SGML element attributes, since a DTD must specify a fixed set of these, but HTML-Math can't know ahead of time the full set of (for example) nonstandard rendering attributes. Therefore, though some specific annotation attributes may be defined by HTML-Math and then may be represented by SGML attributes, there must be another syntax provided for specifying nonstandard attributes in general. Exactly which syntax should be used (of the many obvious possibilities) is still subject to discussion.) HTML-Math should probably provide several ways of specifying annotations, including SGML attributes for particular annotating attributes and/or general kinds of annotations, and/or one or more SGML elements for adding certain kinds of annotations. I'll try to add a reasonable set of ways to the next draft of the Wolfram Proposal. I would appreciate suggestions of specific annotation syntaxes that members of the ERB would like to see added to the proposal. (Whatever syntaxes are used need to have the property that an annotated subexpression is still a single subexpression according to the parsed syntax.) Note that there is no essential difference to HTML-Math between semantic annotations and nonstandard rendering attributes -- in both cases, it is up to a renderer to decide whether and how to use the information. Since translation into a CAS (or into a representation of mathematical meaning, such as OpenMath or Roy Pike's proposed extension to ISO12083) is considered to be rendering into a special medium, semantic attributes which affect this translation are simply rendering attributes applicable to these media. Macros Another way in which authors want to provide extra information is by defining and using "macros", which allow them to express information in a "higher-level" form than is normally necessary to render it. A macro facility can be useful in several ways (which have been discussed before in this group, most recently in Bob Sutor's letter about a \Vector macro in TeX): - it allows authors the convenience of using abbreviations for commonly-used expressions, or patterns (forms) of expressions, which are unpleasantly long to type otherwise (as they might be, for example, if they contain a lot of semantic annotations); - it allows authors to ensure the consistency of representation of similar things, e.g. identifiers for vectors; - it allows the way in which similar things are represented to be changed all at once by making a single change to a document; - it gives authors a useful way of expressing high-level information, such as semantic connotations (e.g. "this variable represents a vector"); - assuming the macro expansion rules can be overridden by viewers, perhaps differently for different media, and assuming a document makes use of some high-level macros from some collection known to both author and viewer, it allows viewers to make use of that high-level (e.g. semantic) information by providing customized renderings for specific kinds of high-level objects (i.e. for specific macros) (e.g. a viewer, not just an author, can decide to change the rendering of all the vector variables in a document). The Wolfram Proposal for HTML-Math, as described in the URL mentioned above, already includes a fixed set of "transformation rules" (some of which are shown and explicated in that document) which are part of the means by which it translates HTML-Math source text into a "presentation tree" (thanks to Patrick Ion for this new, better term for what was called a "display list" in that document). These rules operate on an expression tree generated by parsing of operators and SGML markup -- they never operate directly on source text. Please refer to that document for an overview of how these rules work, and for a way of representing them which would be usable in macro definitions. (The process of applying the transformation rules has also been referred to as "pattern-matching" or "template-matching".) It was always our intention to amend the proposal to allow this set of rules to be author-extensible (and to call the ones added by authors "macro expansion rules"), and thus to allow authors to define macros. Authors would do this by choosing any form they wished for a new macro's invocations to take, preferably a form which would not otherwise be used in their documents, and defining a new rule which would recognize this form and turn it into another form for rendering (possibly after further macro-expansion of the new form). The advantages of using general transformation rules as macro definitions, rather than requiring all macro invocations to have a standard appearance (such as a macro name followed by a parenthesized list of arguments) include: - authors can use whatever form seems natural for what they want to express, e.g. a special operator (assuming they can extend the set of operators; see below); the extensions can be as syntactically general as the native language; - authors (or viewers) can define expansions for constructs which were originally being rendered directly (or at least by means of built-in rules) rather than via macros; - certain built-in constructs (like the subscript and superscript operators) can be defined using the same mechanism. Authors who want all their macro invocations to have a standard appearance are of course free to use only macros which are defined in a uniform way. Other forms of extensibility Other things in the Wolfram Proposal that should be extensible include not only its transformation rules, but its dictionary of character and operator properties, the set of layout schema allowed (though the nonstandard ones can't have individual SGML tags), and perhaps even the set of extended character entity names that can be used (this is controversial, I assume, since author extensions to this would invalidate any HTML-Math DTD, which has to list a fixed finite set of entity names). Whatever the things that can be extended, the methods of extension ought to allow: - making an extension for all of a document, or any part which consists of a single SGML element (perhaps requiring the addition of a special type of element to designate the part to be extended); - giving extensions directly, and/or giving a URL which points to a document in which some extensions are defined; - making extensions incrementally, or by replacement of entire portions of the context (e.g. the entire operator dictionary); - some extensions which are not overridable by viewer preferences (e.g. to operator precedence, and macro definitions used only as abbreviations), and some which are (e.g., macro definitions used to provide new high-level constructs). All details of these facilities are subject to discussion. Unlike the annotations facility discussed above, the design decisions here are not just making choices among many obvious possibilities (all essentially equivalent), but rather (IMHO) involve some subtle issues (such as the precise order in which rules should be applied, and re-applied to already-expanded parts of an expression) in which the various choices are not equivalent and the best choice is not obvious. (There is also a large set of decisions which are not so subtle but still affect the scope of the kinds of macros that can be defined, such as how general are the facilities for pattern recognition.) Extended characters with semantic connotations One aspect of our proposal not discussed above is that it allows the choice by an author of one of several extended characters, which typically render almost the same, but which carry different semantic connotations (e.g. if a different extended character is provided to mean the mathematical constant, or a variable, named by the greek letter Pi). This has some similarity to semantic annotation, and some similarity to the choice among several macros which render the same, but is not quite the same as either of these. We propose to build in several sets of similar extended characters of this nature. (The complete list will be provided with the complete character dictionary.) Whether authors can define new such characters is an important issue, which was mentioned above. -- BruceReceived on Friday, 30 August 1996 14:01:39 UTC
This archive was generated by hypermail 2.4.0 : Saturday, 15 April 2023 17:19:57 UTC