Re: Thoughts on mrow default intent from Deyan Ginev on 2020-12-16 (public-mathml4@w3.org from December 2020)

From: Deyan Ginev <deyan.ginev@gmail.com>
Date: Wed, 16 Dec 2020 17:23:14 -0500
To: Neil Soiffer <soiffer@alum.mit.edu>
Cc: public-mathml4@w3.org
Message-ID: <CANjPgh_2jBMndPusJD6Atp5dNfmq6vSe7TichQZErrX=TUMaJw@mail.gmail.com>
Hi Neil, all,

Apologies in advance to the group if we end up generating large email
volume on the public list, since I think there is a lot of maze-like
territory we can get lost into...

1. Based on some joint experience with Bruce in building math grammars for
various STEM domains, I would caution against entering "serious" grammar
rule territory for the defaults, or they'll become impossible to predict
over real-world expressions.

2. I liked the *simplest* variant of Sam's mrow proposal where
 - an mrow with a single "mo" child is interpreted as applying that op to
argument-like children ("argument-like" defined separately)
 - any other mrow is unwrapped and its children are traversed
left-to-right, recursively applying the spec (explicit markup + defaults).

So for "-3!" you'd need two mrows for this default rule - an inner one to
wrap the factorial, and an outer one to wrap the minus.
But if you want to automatically recognize a mini grammar of say K12
mathematics, anything beyond calculator syntax will get tricky very quickly.

- Even for a flat "2x + y" with an invisible-times "mo", one needs a
grammar directive that invisible ops have higher precedence compared to
additive ops. And sometimes whatever you decide is wrong, and it ends up
covered by the New York Times:
https://www.nytimes.com/2019/08/05/science/math-equation-pemdas-bodmas.html

I would take that NYT article as strong caution against embarking on an
ambitious grammar journey for our accessibility-oriented spec.

3. I would even suggest stepping away from the desire to have "content
trees" in the default rules for anything that isn't trivial or extremely
constrained. Neil's examples 1,2,3,6,7 above can be narrated without any
additional annotations, and without defaults, just reading through the
presentation tree as usual. So why do anything at all for them?

4. I like the moniker @unwrap, I think it is common lingo when working with
DOMs, jquery etc when treating an element as a "transparent wrapper".

5. Question: did you think of |a| as defaulting to "absolute value"? If so,
why that operation? It could be any of "absolute-value", "cardinality",
"norm", "seminorm", "determinant", "hyperdeterminant", "order-of-group", to
borrow from our Level lists. I can imagine that absolute value is the one
taught earliest, but I do remember Bulgarians and Romanians learn the
determinant syntax in the last grades of highschool. A quick search on Khan
academy suggests this is taught in "precalculus", which may be encountered
in the last years of K12 also in the US. So we get two meanings for the |a|
notation already in K12 math.

- Would you be aiming to default *unique* syntax in K12, or are we making
some informed choice of shortlisted notations - and informed on what basis?

Greetings,
Deyan

On Wed, Dec 16, 2020, 3:47 PM Neil Soiffer <soiffer@alum.mit.edu> wrote:

> I've given a little thought to what I think is wrong about the defaults
> for mrow and have some suggestions and open questions.
>
> The defaults for mrow are @op or @append. @op is used when "a unique
> visible descendant mo element' is present, otherwise @append is used.
>
> Here are some mrow cases:
>
>    1. a+b -- single <mo>
>    2. a+b+c+d -- multiple <mo>s, all the same (do we need to distinguish
>    between ones that only allow two args (i.e., binary vs nary)?
>    3. a | b -- (potentially different than '1' because of '4')
>    4. |a| -- same <mo> as '3' (potentially different Unicode, but not
>    necessarily). Multiple  <mo>s like '2' but not infix, so different behavior.
>    5. (a) -- multiple <mo>s, but "matching" ops
>    6. a+b=c -- multiple *different*  <mo>s
>    7. 2a -- no <mo> (no invisible one given)
>
> I think Sam means to include '1' - '4' as using @op, but maybe only cases
> 1 and 3.
>
> Cases '3' and '4' (might) mean the "form" (prefix, infix, postfix) should
> be part of the operator value look up.
>
> Case '5' (and '4' if not included above) seem relatively common and should
> pick up a default meaning. The match would be the pattern starts with mo
> and ends with mo and the lookup would need to pass both in. Potentially
> this is restricted to starts with prefix mo and ends with postfix mo, but
> that restriction misses out on the alternate interval notation "]a,b["
> (etc) along with (maybe) the braket notation "<a|". Removing that
> restriction would pick up oddball cases like "-3!", but the point of the
> default is to be right *most* of the time so that the need for explicit
> markup is minimized.
>
> Rather than calling 6 and 7 (and any other cases that don't become "@op")
> "@append", I think something like @uninterpreted or the shorter @unparsed
> would be a better name.
>
> @extend probably wants to be redefined to be "process the children; if the
> previous intent is @unparsed [or whatever name we use], append the contents
> to the previous contents; otherwise create a new wrapper with
> intent @unparsed with the content consisting of the previous content and
> the @extend's content children".
>
> The alternative to @append/@uninterpretted/@unparsed is to actually parse
> the mrow to form an intent. For example, '6' would essentially become
> '{a+b} = c' where there is one operator and so we get the obvious
> intent='equals( plus($1, $3), $5)'. Requiring parsing likely is a bridge
> too far for interpreting the value of an attribute.
> Some food for thought/discussion...
>
>    Neil
>
>
Received on Wednesday, 16 December 2020 22:23:55 UTC