sources and outputs

The current braille thread has mentioned the issue of how
one interprets ambiguous markup.  I am starting this thread to
ask to what extent ambiguous markup is within our scope.

How will MathML come to us?  I see three ways:

1) A suitable authoring markup language, or a suitable editing
environment, will produce content which encodes the meaning of
the source mathematics.  That source will be transformed
(somehow, and in a way we don't have to worry about)
into MathML with intent information, following the rules which we
will eventually agree on.

Assistive technology, once it is made aware of the new rules,
will be able to perfectly pronounce the resulting MathML.

I think this is the most important case to support.  Our job is
to describe rules for the end product:  MathML with intent information.

------

The other extreme is:

2) Legacy material existing in the wild, and new material created
with no thought about intent.  For example, Deyan's millions of equations
in arXiv.

This is a problem which has been around for a long time.  People like
Neil S. have heuristics which do well in many common cases.

I don't see that this is our problem.  As individuals many of us want
to do something about this case, but why it is our job to say what to do
with every bit of MathML that ignores the rules we are going to devise?

-----

And there is a broad middle case:

3) Legacy material, or new material, which is ambiguous but which could
be improved by a small amount of editing.  This could involve adding
"topic" information, such as "multivariable calculus".  (I don't think
it matters whether the editing is by a human, or by a machine
implementing the heuristics from 2) above.)

Such efforts would decrease the number of ambiguous cases, but not
eliminate them.

I could go either way on whether this is our problem.  It would be
helpful to provide some general principles.  But I see a difficulty
avoiding the slippery slope of codifying all the heuristics of 2).
And even if we did that, we know that misinterpretations would
still be common.

-----

So my question is:  should we just focus on specifying the right
way to encode the intent, for those cases where complete information
is available to the system doing the encoding?

And if we decide that we should also offer some advice on how to deal
with legacy/ambiguous MathML, how far should we go?

Regards,

David Farmer

Received on Wednesday, 14 July 2021 13:49:45 UTC