[Prev][Next][Index][Thread]
(long) Content Markup suggestions for next version of MathML
Hi all,
I have no idea where the discussion on the next version of MathML is
going right now, or if Content MathML has been worked over yet in that
process. In case these points have not yet been discussed in the
working group, I would like to point out a few places that I believe
could use some clarification or adjustment in order to come closer to
the stated requirement for MathML content markup:
"Since the intent of MathML content markup is to encode mathematical
expressions in such a way that the mathematical structure of the
expression is clear, the syntax and usage of content markup must be
consistent enough to facilitate automated semantic interpretation."
Here goes:
- Compare these two quotes from the specs:
-- "The condition element is always used together with one or more
bvar elements."
-- "Note that the bound variable may be implicit:
<apply><max/>
<condition>
<apply><and/>
<reln><in/><ci>x</ci><ci type="set">B</ci></reln>
<reln><notin/><ci>x</ci><ci type="set">C</ci></reln>
</apply>
</condition>
</apply>"
These two obviously contradict each other.
I strongly recommend striking the second quote from the
spec. Making the bound variables implicit like that is always a
very bad idea in a semantically oriented language, as the specs
note in another place:
"(The condition may involve more than one symbol.)"
This same kind of mistake was made in the specs for KIF 3.0 for the
"setofall" operator, leading to incorrect semantics whenever the
condition contained a parameter (arbitrary constant).
Consequently(?), the current ANSI draft spec for KIF no longer
contains a "setofall" operator.
The main point is that correct automated semantic interpretation
cannot be guaranteed unless the requirement for listing the bound
variables is always enforced.
- "It is an error to enclose a relation in an element other than
reln."
Actually, they may also be enclosed in a <fn>, since <fn> turns
anything into a function (e.g., a relation into its characteristc
function mapping to {0,1}).
Also, the definition of <apply> says that anything appearing as
its first argument is autmagically interpreted as a function as
if it were wrapped in a <fn>. Consequently, relations may also
appear in <apply>s.
- "When used with int [or sum or product], each qualifier schema is
expected to contain a single child schema; otherwise an error is
generated."
This clashes with the definition of <interval>, doesn't it?
But then, I also noticed this:
- <interval> is used in a dual fashion: as a qualifier, and as a
constructor. This can in some rare cases lead to ambiguities:
"Considering interval-valued functions F bounded by functions
f and g (i.e., F=[f,g], to abuse notation), it is easy to see
that the integral of F (i.e., the integral of [f,g]) is
[integral of f, integral of g]."
If you consider representing this in content-MathML, here is
how you would want to do it interpreting <interval> as a
constructor:
<reln><eq/>
<apply><int/>
<interval> <fn><ci>f</ci></fn>
<fn><ci>g</ci></fn>
</interval>
</apply>
<interval>
<apply><int/> <fn><ci>f</ci></fn> </apply>
<apply><int/> <fn><ci>g</ci></fn> </apply>
</interval>
</reln>
However, the MathML spec would wrongly lead to an interpretation
of the first of these <interval>s as a qualifier, because that's
the syntactic disambiguation specified in MathML.
It is possible, of course, to circumvent this problem by wrapping
that <interval> expression with a <fn>, but that kind of a
disambiguation technique may not always be desirable in similar
cases.
The easiest solution here would be to strike <interval> from the
list of qualifiers entirely and instead note that <interval> is a
kind of set, and we can therefore simply specify
<condition> <interval> ... </interval> </condition>
because <condition> is allowed to contain a set rather than a
boolean expression, giving it precisely the meaning we want.
However, note that in the case of a set-valued condition, the
sibling <bvar> variable's scope excludes that set (and thus the
<condition> qualifier), because there is an implicit "var \in set"
wrapped around it. In the case of a predicative <condition>, on
the other hand, the <condition> is inside the scope of the sibling
<bvar> variable(s) because the integration variable appears in the
condition.
This observation would argue for replacing the
<interval> qualifier by a more general qualifier for sets that the
dependent variable ranges over instead:
<rangesover> <interval> ... </interval> </rangesover>
(I should reiterate my opinion here that it's not a good idea to
allow sibling nodes to be in different scopes. OpenMath has
rightly opted against doing that, albeit after long and hard
discussions, because solutions would be much cleaner, and
"automated semantic interpretation" much "facilitated"
if scope boundaries were always container element boundaries, both
for operator *and* for variable scopes. See [1].)
- Note that we could also write
<set> <interval/> ... </set>
instead of <interval> ... </interval> if we generalize the <set>
constructor to take set operators as a first argument and act a
little like <fn> and <reln> in this respect. <interval/> would be
one such operator, set union and similar operators would be others.
Another major suggestion that I would like to make is to include
discussion of variable scoping in the discussion of semantics of
MathML elements "to facilitate automated semantic interpretation".
Here are some rules that may cover that topic in the current version
of MathML:
- The scope of a variable appearing in a bvar qualifier element is the
container element containing the bvar qualifier, and all its
children except <interval>, <lowlimit>, or <uplimit>
qualifiers appearing as siblings of the <bvar> qualifier.
(I discussed the reason why interval, lowlimit, and uplimit
(as well as a potential "rangesover") are outside that scope at
some length in a message to this forum a year and a half ago.
See also [1] for a more detailed discussion, and how the
compositionality principle comes in.)
In particular, a condition qualifier is within the scope of a
sibling bvar qualifier's variable. (But see above comment on
the <interval> qualifier/constructor: if the <condition> is
a set rather than a predicative expression, the sibling <bvar>'s
variable's scope should *not* include the <condition>!)
- Variables in a bvar element are bound within their scope;
identifiers with identical names appearing outside their scope are
semantically distinct entities that may take on different values in
a valid interpretation, even if they denote the same concept.
To illustrate the point, consider the example
<apply></plus>
<ci>x</ci>
<apply><int/>
<bvar> <ci>x</ci> </bvar>
<ci>x</ci>
</apply>
</apply>
Here, the third x is within the scope of the second x, but the
first x is outside its scope. Conceptually, the third x would
range over some interval while calculating the value of this
expression for one particular value of the first x. Nevertheless,
all three occurrences denote the concept of "the x-axis" -- in
particular, the integral is implicitly assumed to produce a
function in x (a variable that is semantically identical to the
first x!).
As far as I can tell, these simple rules would allow one to correctly
interpret the semantics of bound variables in the current MathML. A
set of rules like this would also make it possible to add additional
operators of the product and sum variety (which automatically come
with any n-ary operator) or new quantifiers ("there exists exactly
one" and "for almost all" are two such quantifiers that I have met
with in college), and to correctly interpret them as long as they
adhere to the style used by current MathML practices. Moreover, you
could write a general-purpose MathML interpreter that would obey
variable scoping semantics both for the current and for user-extended
MathML:
<apply> <fn definitionURL="...">exists_uniquely</fn>
<bvar> <ci type="real">α</ci> </bvar>
<apply> <and/>
<reln> <gt/> <ci>x</ci> <cn>0</cn> </reln>
<reln> <eq/>
<apply> <times/> <ci>x</ci> <ci>x</ci> </apply>
<cn> 2 </cn>
</reln>
</apply>
</apply>
(Incidentally, I think all those applys should be relns, and there may
be need for an equivalent of fn for relations.)
Some other minor points:
- "implies" is listed as "relation", while "and", "or", "xor", and
all the rest are "operators". This is inconsistent.
(Personally, I'd list them all, along with the quantifiers, as
boolean operators/ relations.)
- `4.3.2.7 order
list
indicates ordering on the list. Predefined values:
lexicographic, numeric
Default = "numeric" '
Shouldn't the default be "unordered" or some such thing, for the
case where the list is given by naming its elements, which may be
totally unordered?
Finally, I would like to verify my understanding of this point, because
it may be a source of incompatibility with OpenMath:
"real: A real number is presented in decimal notation. Decimal
notation consists of an optional sign ("+" or "-") followed by a
string of digits possibly separated into an integer and a fractional
part by a "decimal point". Some examples are .3, 1, and -31.56. If a
different BASE is specified, then the digits are interpreted as being
digits computed to that base.
"A real number may also be presented in scientific notation. Such
numbers have two parts (a mantissa and an exponent) separated by
e. The first part is a real number while the second part is an integer
exponent indicating a power of the base. For example, 12.3e5
represents 12.3 times 10 ^5."
Does this mean that MathML represents "big floats" -- floats to an
arbitrary precision (read: number of digits)?
Sorry for the long message...
Regards,
Andreas Strotmann
[1] L.J.Kohout, A.Strotmann: "Understanding and Improving Content
Markup for the Web: from the Perspectives of Formal Linguistics,
Algebraic Logics, and Cognitive Science." in: ISIC/CIRA/ISAS '98 Joint
Conference on the Science and Technology of Intelligent Systems.
PS: My apologies for not sending this sooner, but my move to and
first time at FSU was a bit time-consuming.
Follow-Ups: