YASP 2 -- including the operators from Neil Soiffer on 2020-07-01 (public-mathml4@w3.org from July 2020)

From: Neil Soiffer <soiffer@alum.mit.edu>
Date: Wed, 1 Jul 2020 00:18:47 -0700
To: public-mathml4@w3.org
Message-ID: <CAESRWkAPmgZBQ5BghrAMsesoxR6DJGuXrQj_AYdQP+qKn8bJGg@mail.gmail.com>
*Background/Setup*

As I've mentioned before, for speech, the operators may want to be
highlighted as they are spoken, so semantics need to include them somehow.

Here's a somewhat different case -- multiple notations for the same
semantics. Consider "division". I believe the current proposals all have
something like  notation="divide(@1,@2)" for the following cases:

   - mfrac
   - mrow with "/"
   - mrow with "÷"

where Deyan/Bruce's proposal references the 'divide' indirectly.

For speech, it is likely the first will use "over" and the last "divided
by". For at least some braille, the braille for them is different. Because
of this, I think we need more than one name for the division semantics.

Another example is "times" -- there are multiple symbols for times and they
will be brailled differently, and potentially spoken differently
(especially &InvisibleTimes;).

And then there are nary functions. Consider these cases:

   - a+b+c
   - a-b-c
   - a+b-c

Do these get marked up as "plus", "minus", and "plus-and-minus"
respectively? Or should they all use "plus-and-minus"?

On the calls, I brought up an nary relational operator case:
x = ... < ... <= .. = y
Should there just be relational-op(args...) and not have equal(...), etc?

*New Proposal*
Here's a new proposal that maybe harks back a little to David C's idea and
Bruce's original idea: for prefix/infix/posfix/nary/fenced notations,
always include the operators in order.

This solves the nary problem with plus-and-minus, times, etc:
For "a+b-c+d":
    notation = plus-or-minus(@1,@2,@3,@4,@5,@6,@7)

That notation is cumbersome for nary operations, so I propose the following
notations:
@label -- references arg="label"
@n -- nth child (can be extended to provide a path)
@* -- references all children ('*' is a nod to regular exprs)

With "@*", the above becomes
    notation = plus-or-minus(@*)

Most linear notations can use "@*". E.g,
    notation = "factorial(@*)"

For intervals, we don't need to name all four options, although maybe we
should, given the varying notations: "(a,b)" and " ]a,b[" both mean the
same thing although they are not ambiguous. So we could have just
    notation="interval(@*)"
or we could have "interval-open-open", "interval-open-close", etc.

The idea of using all the arguments works well for speech, but what about
applications that don't care about the operators? This new proposal doesn't
mark them up because they are implicit. If something is a prefix operator,
then the operator is the first argument and the operand is the second
argument. A similar thing is true for all the other linear forms, with nary
operators starting with an operand and alternating operator/operand after
that.

Here's where this proposal departs from Bruce's earlier proposal and from
David's: you don't describe the structural form -- you never explicitly say
something is prefix, etc. The interpreter of the function will know this.
For example, for "factorial", the interpreter needs to know what
"factorial" translates to in their target. Since it knows this, it also
will know that factorial is a postfix operator and that the operand is the
first argument.

Returning to the three forms of fractions... Only two different functions
are needed: one for the 2D case and one for all the forms of the linear
case. Only one is needed for the linear cases because the operator used
(and hence the value that needs to brailled or spoken) is included in the
operation:

   - mfrac: notation="fraction(@1,@2)"
   - mrow with "/": notaton="divide(@*)"
   - mrow with "÷": notaton="divide(@*)"

For the various forms of +/-, I think one "plus-or-minus" name works, as it
does for 'times". The same is true for relational operators: just have one
"relational-op".

One could go even further: use just one name for each "form" of operators:
"prefix(...)", etc. The converter can check the operator and do what it
needs to do. This cuts down the number of names we need to define by a *huge
*amount while increasing the burden on translators only a little. However
it fails to deal with ambiguous symbols like “×", which is the reason we
want to add semantic markup in the first place. Hence, I think that is a
step too far.

If there is no explicit operator, then we are back to the previous
proposals:

   - power(@1,@2)
   - binomial(@arg1, @arg2) or binomial(@2@1, @2@2)

There is no operator to highlight for speech and no need to specify a
structural form.

We haven't discussed functions, but I believe they work ok:
<mrow notation="function(@1, @3)">
   <mi>f</mi>
   <mo>(</mo>
   <mrow notation="times(@@)">
     <mi>3</mi>
     <mi>y</mi>
   </mrow>
   <mo>(</mo>
</mrow>

An interesting question is what to do if there is/is not an
InvisibleFunctionApply or InvisibleTimes present. My guess is that
InvisibleFunctionApply is not useful in the 'notation attr and should be
omitted (but is useful to infer semantics!). On the other hand, since times
is an infix operator, it really should be part of the semantics. In fact, I
think the above is better written as:
<mrow notation="times(@1, &InvisibleTimes;, @2)">

There is a braille problem with using the same "binomial" name for the
'over' form and the subscripted form -- the braille Nemeth code (at least,
and probably other braille codes) uses different symbols for these cases.
Sam knows better than I do, but it may be the case that for braille, it is
better to use the MathML tags than any semantics for braille generation
because most (all?) braille tends to be based on the presentational
notations used, not the semantics.

*Summary*

This modification handles the troublesome nary cases

   - add the convenience notation "@*" to indicate all args
   - linear/mrow forms include both the operands and operators as their
   arguments


*Appendum*

Dealing with the case discussed in the other email thread:
<mrow>
  <mi>m</mi>
  <mo>!</mo>
  <mi>n</mi>
  <mo>!</mo>
</mrow>

The "@*" notation is sadly not useful. So as others suggested, it would be
<mrow notation="times(factorial(@1,@2), &InvisibleTimes;,
factorial(@3,@4))"> ... </mrow>

or you could label the leaves and refer to their labels.

    Neil




<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Received on Wednesday, 1 July 2020 07:19:09 UTC