- From: Andreas Strotmann <strotman@nu.cs.fsu.edu>
- Date: Thu, 13 May 1999 16:52:07 -0400 (EDT)
- To: www-math@w3.org
- cc: Andreas Strotmann <strotman@nu.cs.fsu.edu>

Hi all, I have no idea where the discussion on the next version of MathML is going right now, or if Content MathML has been worked over yet in that process. In case these points have not yet been discussed in the working group, I would like to point out a few places that I believe could use some clarification or adjustment in order to come closer to the stated requirement for MathML content markup: "Since the intent of MathML content markup is to encode mathematical expressions in such a way that the mathematical structure of the expression is clear, the syntax and usage of content markup must be consistent enough to facilitate automated semantic interpretation." Here goes: - Compare these two quotes from the specs: -- "The condition element is always used together with one or more bvar elements." -- "Note that the bound variable may be implicit: <apply><max/> <condition> <apply><and/> <reln><in/><ci>x</ci><ci type="set">B</ci></reln> <reln><notin/><ci>x</ci><ci type="set">C</ci></reln> </apply> </condition> </apply>" These two obviously contradict each other. I strongly recommend striking the second quote from the spec. Making the bound variables implicit like that is always a very bad idea in a semantically oriented language, as the specs note in another place: "(The condition may involve more than one symbol.)" This same kind of mistake was made in the specs for KIF 3.0 for the "setofall" operator, leading to incorrect semantics whenever the condition contained a parameter (arbitrary constant). Consequently(?), the current ANSI draft spec for KIF no longer contains a "setofall" operator. The main point is that correct automated semantic interpretation cannot be guaranteed unless the requirement for listing the bound variables is always enforced. - "It is an error to enclose a relation in an element other than reln." Actually, they may also be enclosed in a <fn>, since <fn> turns anything into a function (e.g., a relation into its characteristc function mapping to {0,1}). Also, the definition of <apply> says that anything appearing as its first argument is autmagically interpreted as a function as if it were wrapped in a <fn>. Consequently, relations may also appear in <apply>s. - "When used with int [or sum or product], each qualifier schema is expected to contain a single child schema; otherwise an error is generated." This clashes with the definition of <interval>, doesn't it? But then, I also noticed this: - <interval> is used in a dual fashion: as a qualifier, and as a constructor. This can in some rare cases lead to ambiguities: "Considering interval-valued functions F bounded by functions f and g (i.e., F=[f,g], to abuse notation), it is easy to see that the integral of F (i.e., the integral of [f,g]) is [integral of f, integral of g]." If you consider representing this in content-MathML, here is how you would want to do it interpreting <interval> as a constructor: <reln><eq/> <apply><int/> <interval> <fn><ci>f</ci></fn> <fn><ci>g</ci></fn> </interval> </apply> <interval> <apply><int/> <fn><ci>f</ci></fn> </apply> <apply><int/> <fn><ci>g</ci></fn> </apply> </interval> </reln> However, the MathML spec would wrongly lead to an interpretation of the first of these <interval>s as a qualifier, because that's the syntactic disambiguation specified in MathML. It is possible, of course, to circumvent this problem by wrapping that <interval> expression with a <fn>, but that kind of a disambiguation technique may not always be desirable in similar cases. The easiest solution here would be to strike <interval> from the list of qualifiers entirely and instead note that <interval> is a kind of set, and we can therefore simply specify <condition> <interval> ... </interval> </condition> because <condition> is allowed to contain a set rather than a boolean expression, giving it precisely the meaning we want. However, note that in the case of a set-valued condition, the sibling <bvar> variable's scope excludes that set (and thus the <condition> qualifier), because there is an implicit "var \in set" wrapped around it. In the case of a predicative <condition>, on the other hand, the <condition> is inside the scope of the sibling <bvar> variable(s) because the integration variable appears in the condition. This observation would argue for replacing the <interval> qualifier by a more general qualifier for sets that the dependent variable ranges over instead: <rangesover> <interval> ... </interval> </rangesover> (I should reiterate my opinion here that it's not a good idea to allow sibling nodes to be in different scopes. OpenMath has rightly opted against doing that, albeit after long and hard discussions, because solutions would be much cleaner, and "automated semantic interpretation" much "facilitated" if scope boundaries were always container element boundaries, both for operator *and* for variable scopes. See [1].) - Note that we could also write <set> <interval/> ... </set> instead of <interval> ... </interval> if we generalize the <set> constructor to take set operators as a first argument and act a little like <fn> and <reln> in this respect. <interval/> would be one such operator, set union and similar operators would be others. Another major suggestion that I would like to make is to include discussion of variable scoping in the discussion of semantics of MathML elements "to facilitate automated semantic interpretation". Here are some rules that may cover that topic in the current version of MathML: - The scope of a variable appearing in a bvar qualifier element is the container element containing the bvar qualifier, and all its children except <interval>, <lowlimit>, or <uplimit> qualifiers appearing as siblings of the <bvar> qualifier. (I discussed the reason why interval, lowlimit, and uplimit (as well as a potential "rangesover") are outside that scope at some length in a message to this forum a year and a half ago. See also [1] for a more detailed discussion, and how the compositionality principle comes in.) In particular, a condition qualifier is within the scope of a sibling bvar qualifier's variable. (But see above comment on the <interval> qualifier/constructor: if the <condition> is a set rather than a predicative expression, the sibling <bvar>'s variable's scope should *not* include the <condition>!) - Variables in a bvar element are bound within their scope; identifiers with identical names appearing outside their scope are semantically distinct entities that may take on different values in a valid interpretation, even if they denote the same concept. To illustrate the point, consider the example <apply></plus> <ci>x</ci> <apply><int/> <bvar> <ci>x</ci> </bvar> <ci>x</ci> </apply> </apply> Here, the third x is within the scope of the second x, but the first x is outside its scope. Conceptually, the third x would range over some interval while calculating the value of this expression for one particular value of the first x. Nevertheless, all three occurrences denote the concept of "the x-axis" -- in particular, the integral is implicitly assumed to produce a function in x (a variable that is semantically identical to the first x!). As far as I can tell, these simple rules would allow one to correctly interpret the semantics of bound variables in the current MathML. A set of rules like this would also make it possible to add additional operators of the product and sum variety (which automatically come with any n-ary operator) or new quantifiers ("there exists exactly one" and "for almost all" are two such quantifiers that I have met with in college), and to correctly interpret them as long as they adhere to the style used by current MathML practices. Moreover, you could write a general-purpose MathML interpreter that would obey variable scoping semantics both for the current and for user-extended MathML: <apply> <fn definitionURL="...">exists_uniquely</fn> <bvar> <ci type="real">α</ci> </bvar> <apply> <and/> <reln> <gt/> <ci>x</ci> <cn>0</cn> </reln> <reln> <eq/> <apply> <times/> <ci>x</ci> <ci>x</ci> </apply> <cn> 2 </cn> </reln> </apply> </apply> (Incidentally, I think all those applys should be relns, and there may be need for an equivalent of fn for relations.) Some other minor points: - "implies" is listed as "relation", while "and", "or", "xor", and all the rest are "operators". This is inconsistent. (Personally, I'd list them all, along with the quantifiers, as boolean operators/ relations.) - `4.3.2.7 order list indicates ordering on the list. Predefined values: lexicographic, numeric Default = "numeric" ' Shouldn't the default be "unordered" or some such thing, for the case where the list is given by naming its elements, which may be totally unordered? Finally, I would like to verify my understanding of this point, because it may be a source of incompatibility with OpenMath: "real: A real number is presented in decimal notation. Decimal notation consists of an optional sign ("+" or "-") followed by a string of digits possibly separated into an integer and a fractional part by a "decimal point". Some examples are .3, 1, and -31.56. If a different BASE is specified, then the digits are interpreted as being digits computed to that base. "A real number may also be presented in scientific notation. Such numbers have two parts (a mantissa and an exponent) separated by e. The first part is a real number while the second part is an integer exponent indicating a power of the base. For example, 12.3e5 represents 12.3 times 10 ^5." Does this mean that MathML represents "big floats" -- floats to an arbitrary precision (read: number of digits)? Sorry for the long message... Regards, Andreas Strotmann [1] L.J.Kohout, A.Strotmann: "Understanding and Improving Content Markup for the Web: from the Perspectives of Formal Linguistics, Algebraic Logics, and Cognitive Science." in: ISIC/CIRA/ISAS '98 Joint Conference on the Science and Technology of Intelligent Systems. PS: My apologies for not sending this sooner, but my move to and first time at FSU was a bit time-consuming.

Received on Thursday, 13 May 1999 16:52:17 UTC