Re: intents and defaults and structure

Hi Neil, all,

To sprinkle in some data, here is a very brief sampling of use of the
% sign in arXiv:
https://hackmd.io/@dginev/HJYj-6Lod

I have a lot more comments on the "operator dictionary" details, but
I'm afraid every next email I send reduces the chance of others
replying :-) We can start a new thread for the technical details
there, or just discuss tomorrow.

Greetings,
Deyan

On Wed, Jun 16, 2021 at 11:12 AM Neil Soiffer <soiffer@alum.mit.edu> wrote:
>
> I'll start with a general "you are correct, as always" comment and then get into the weeds.
>
> I, with help from David C, have been working on the operator dictionary for the last 1.5 years, although I haven't done anything on it for a year. However, it is substantially updated from MathML 3. David has moved the latest and greatest version to the w3c github site. There you will see that '%' comes in both an infix and postfix version. I'd like to say that I spotted that change on my own, but truthfully, I added the prefix version based on your observation. The missing postfix entry is important not just for parsing/my proposed canonicalization, it is also important so that it is displayed with the proper amount of spacing -- definitely a good catch.
>
> I will reiterate what I said a long while back, it would be really helpful to have other people look at the operator dictionary for omissions, errors, and needed deletions (many things have been removed from the version in MathML 3). Looking at github history, I see some issues (e.g., ellipsis removal) were brought to the group. I spent a lot of time looking at usage of Unicode characters that act as operators and did my best to understand how they group relative to other operators (for some, I couldn't find any paper that used them), but I'm sure there are issues. I'm particularly dubious that I have many of the postfix entries correct relative to how they should group with prefix and infix operators. E.g., should  √  bind more tightly than "!" or "%"?  I'm a bit more confident about the relative priorities of the infix operators. My focus was on the more obscure Unicode operators; perhaps I need to take a pass at more common characters since '%' was clearly overlooked.
>
> With the update, "0.15 = 15%" will parse as one expects. The same can be said for "15% x = 10", but only if there is an invisible times between the "%" and the "x". If it is there, then the postfix interpretation of "%" should be used. If it is not there, then the infix interpretation should be used (unless there is an attr specifying the "form"). Given what MathML generators do today, this is definitely problematic. The only solution I know of would be to drop the infix "%", which as you noted is used in computer languages. However, it is also used for calculator-like notations. I'm dubious we can get any stats on MathML usage of '%', but maybe you can get some about arXiv usage (although that would likely be skewed away from lower grade math where '%' for "mod" is more likely).
>
> As for default intents, I agree with you that they should only be done for level one items. I'm a little less sure about doing that only when there is a very clear preference one way or the other. Perhaps with the addition of a subject attr value, it's default becomes clear but in the absence of that context, it is not clear. In any case, getting real world data is invaluable to validate any idea.
>
> Thanks for your comments and I strongly request others to get involved also.
>
>     Neil
>
>
> On Mon, Jun 14, 2021 at 5:45 AM Deyan Ginev <deyan.ginev@gmail.com> wrote:
>>
>> Hi Neil, all,
>>
>> I assume by "operator dictionary" you're referring to:
>> https://www.w3.org/TR/MathML3/appendixc.html
>>
>> These indeed induce one grammar for math syntax. It is hard to
>> understand *which* grammar that is. Is there some testbed of supported
>> expressions/notations that it is expected to cover? Math grammars are
>> quite hard to develop and maintain, so I would be a bit anxious of
>> adopting one "officially" without a good plan.
>>
>> To draw a random example from the operator dictionary, I saw % is
>> marked as infix with priority 640. But in the expression "15% x = 10",
>> it is meant to be read "fifteen percent of x equals to ten". And not
>> "fifteen modulo x equals ten", as in programming languages, which may
>> have been the reason behind choosing the infix form. Consider also the
>> standalone "0.15 = 15%" and "15% = 150‰", to illustrate it's really
>> postfix when intending "percent" - which is the usual meaning in K-14
>> materials.
>>
>> So rather than it being a silver bullet, I think the operator
>> dictionary may need some vetting and reconsideration. The way I see
>> it, we would also have a much easier time if we do not standardize an
>> entire grammar, just because that's the highest difficulty task we can
>> set for ourselves. It is, in my experience, much harder to get right
>> than assembling long lists of concept names and notation forms.
>>
>> If I read you correctly, you agree that whichever way we enrich/infer
>> the mrows, we need to decide on a list of intents that will be
>> expected as default remediation. I think we have some clear common
>> agreement here, also with Sam's document that started fleshing these
>> out as examples
>> (https://mathml-refresh.github.io/discussion-papers/intent). For
>> example, if the exclamation mark (!) is intended to be remediated as
>> the "factorial" by default, that has to be made explicit, as well as
>> the notation where that intent is activated.
>>
>> On my end, I would offer we (try to) create something new that is very
>> narrowly described and scoped. Say by listing all mathematical
>> notations we have encountered, and then cherry-picking (and creating
>> tests for) notations we want recognized by default, from a pragmatic
>> mathematical standpoint. One working definition may be "standard
>> notations in K-14 education that have no mutual overlap". And my
>> instinct is that if we end up with very small and usable defaults,
>> they will be easier to both test against and apply in practice. But we
>> also need to develop them to a degree where the defaults meet "natural
>> expectation", which is a tension in the opposite direction of
>> simplicity. To achieve all of that, I resonated quite strongly with
>> Brian's suggestion during our last meeting - we ought to do a couple
>> of iterations of concept validation coding and demos and develop a
>> test suite. There's a real risk of making a readout worse if the
>> defaults are more often wrong than correct, and our best bet to avoid
>> that is actually check how they work on the materials we intend them
>> for, before we release them to the world.
>>
>> I also welcome more discussion, thanks for starting one!
>>
>> Greetings,
>> Deyan
>>
>>
>>
>> Deyan
>>
>> On Sun, Jun 13, 2021 at 6:09 PM Neil Soiffer <soiffer@alum.mit.edu> wrote:
>> >
>> > I'm writing this email to get some discussion going outside of the meetings. Deyan is also working on this and is working on his own reasoning on this topic. The topic we have started to discuss in the meetings is MathML structure and how that might or might not work well with intents, especially for defaults.
>> >
>> > First off, if an intent is given, there is no ambiguity for the part of the structure given by the intent and the current proposal(s) make no requirements on MathML structure other than it be valid MathML. That doesn't mean that other parts of the structure (e.g, the arg="..." parts) are unambiguous. But it does mean that any software trying to come up with a "meaning" for speech (or otherwise) should not break a stated intent.
>> >
>> > The question we have begun to explore is what can we state about a default in the absence of explicit intents.
>> >
>> > Why do we care about defaults?
>> > To me, a big advantage of using intent over some alternatives like parallel markup is that using "intent" can support progressive enhancement. If we come up with some defaults, then authors only need to use 'intent' when the defaults aren't correct. So if we can figure out reasonable defaults, we can minimize the content that needs work/remediation so that it can be spoken unambiguously. Additionally, rather than require MathML generators change to use parallel markup or some other output, current output remains valid although it may not unambiguously represent author intent. The extent to which it can be disambiguated is tied to how much context is used to make the defaults. It will never be perfect without author's helping out though.
>> >
>> > My Two Cents
>> > Since this message is to provoke discussion and is not meant to be a fleshed out proposal (it's very long as it is), I'll just state my current thoughts and add a little rationale to them, but not get into the details....
>> >
>> > My feeling is that defaults should be based on a canonical parse of the MathML expression that is defined by the operator dictionary and as overridden by any attrs on the <mo> elements. This does not mean that MathML that uses "flat" mrows doesn't have a default, it just means that when parsed, it will have whatever defaults we end up giving. As an example, if an mrow has "a", "+", "5", "!" as direct children, a default for factorial would be used because the canonical form given by the operator dictionary would group the "5" and "!" in a single mrow and that would match a factorial default.
>> >
>> > By mapping expressions to a canonical parse, we can write defaults in a relatively simple manner and not require MathML writers to generate a specific representation (something Deyan's post shows doesn't happen now). It doesn't even require software that wants to apply a default to do the parse, although I think supporting all the various ways of representing a default a default would be hard without parsing. Currently the more sophisticated speech renderers do parse the input to infer intent, so using defaults fit naturally into those speech renderers.
>> >
>> > One thing the operator dictionary does not solve is operand/operand conflict (horizontal juxtaposition of operands). Typically these are either meant to be function call or multiplication (or more precisely, a multiplication-like operation), but it could be an implied plus for mixed fractions or an implied comma in something like M_{11}. How this is specified will determine a canonical parse for those cases.
>> >
>> > Another area of concern is mixing n-ary operators. We've discussed how +/- might parse, along with how a series of relational operators should be given/thought about. I think they should be flat, but reasonable people differ. Clearly something that needs further discussion before defaults can be written for them.
>> >
>> > A final area of concern is matching open/close fences. The operator dictionary has all of the fences with the same priority and I think it makes sense that a "[" can match against a ")" to form an mrow, but there might be some subtle issues that make this a poor default. Note that the French interval notation "]0, (" will not form the intended mrow structure without the author overriding the "form" for the brackets; if overridden it will parse intuitively and a default could be written for that notation.
>> >
>> > Hopefully some food for thought and that this will provoke some discussion outside of the meetings.
>> >
>> >     Neil
>> >
>> >
>> >

Received on Wednesday, 16 June 2021 16:21:36 UTC