Re: Minutes: MathML General Meeting, 6 August 2020

Hi all,

Maybe to play devil's advocate, and avoid us overlooking important
Goals for the charter, here is an itemization of possible uses one
*could* have from a math representation standard, framed from an
application-development standpoint. Feel free to claim the ones you
like - and more importantly to remember others you have liked but
forgot about. Maybe we can discuss any potential extensions to the
three cases we marked in our last meeting. I'll detail some examples
for the things we have *not* mentioned, to better illustrate what we
are *not* emphasizing so far.

Possible application-enabling goals for a math representation standard:
- Current consensus:
 1. presentation / rendering
 2. accessibility / text-to-speech, Braille
 3. search / information retrieval
   3.1. optimized for indexing and querying by search engines
   3.2. plain-text normalization for SEO (e.g. MathML formulas
ultimately appearing in <title> and <description> HTML elements)
---
- Others:
 4. computation / numeric, symbolic
   4.1. interoperability with existing CAS systems, LP solvers,
GeoGebra, prog langs...
 5. formal verification / input
   5.1. interoperability with existing proof assistants, theorem provers
 6. Plotting and graphing (e.g. as in Wolfram|Alpha)
 7. Metadata-rich services
   7.1. hyperlinking to definitions, backmatter (notations,
glossaries, book indexes), e.g. as in DLMF
   7.2. assistive annotations for eLearning (e.g. typing subformulas,
"n" is an integer, "(0,n)" is an interval, "(a_1,a_2)" is a tuple)
 8. Remixing and interactivity (UX)
   8.1. interoperability with the Clipboard API
   8.2. interactive highlighting of related math syntax content /
cross-modal highlighting with related text
   8.3. visual input apps (web palette editors such as
http://mathquill.com/ , fill-in-the-blank math quizzes)
   8.4. eLearning difficulty calibration - collapse/expand
unknown/known notations, equation rewriting
9. Extended coverage beyond traditional STEM fields:
  9.1. finance/economics,
  9.2. programming language syntax
  9.3. music notation,
10. Extended 2D applications related to tables and tabular-like constructs
  10.1. Numeric data sheets
  10.2. Arrow diagrams (~SVG territory)

Ok, I admit I had to force myself to go to 10, so some of these are
vague, artificial, etc. But I think they may anchor the conversation
better, so that we don't wake up in 2021 and realize we completely
dropped the ball on something important.

P.S. Also, I should probably add an 11. where we have an explicit goal
to be "easy to generate from existing scientific authoring toolchains,
notably LaTeX syntax and Office-style WYSIWYG input", it's so implicit
that I forget.

Greetings,
Deyan

On Thu, Aug 6, 2020 at 10:50 PM Neil Soiffer <soiffer@alum.mit.edu> wrote:
>
> There was an interesting and lively discussion today. Unfortunately, we couldn't capture all of it in the minutes. There is no recording for today's meeting.
>
>
>
> Attendees:
>
> Neil Soiffer
>
> Deyan Ginev
>
> Sam Dooley
>
> Louis Maher
>
> Murray Sargent
>
> Patrick Ion
>
> Bruce Miller
>
> Moritz Schubotz
>
>
> Regrets: David Farmer, David Carlisle
>
>
>
> MathML WG Charter: comments, suggestions, etc
>
> MS: I’m happy to see the progress with Core, but I’m concerned the semantics work is not mature enough to put into a standard.
>
> MS: Not clear how the new standard conflicts with content MathML. We shouldn’t have two ways to do things.
>
> SD: Semantics is more like an upgrade path from content MathML. So it becomes a functional replacement for content MathML 4.
>
> NS: Could say we should eliminate Chapter 4 since it is a duplicate?
>
> SD: There is a lot there. I’m not sure about that. The syntax is not the important part, it is content that matters.
>
> [discussion of how open math and content math relate]
>
> BM: Possibly in the long run, content MathML could be deprecated, but only if the developing attribute format becomes sufficiently expressive and successful. I am not sure if semantics is expressive enough to get rid of content MathML, particularly since we’re still grappling with speech vs computation distinctions.
>
> NS: There is strict and pragmatic content, and semantics is aimed at pragmatic mathml.
>
> BM: There was a lot of work in MathML 3 to show how pragmatic maps to strict. I think people should be aiming at strict, not pragmatic.
>
> DG: There is a research project (MathWebSearch) that uses “strict Content MathML” to index 500 million formulas from arXiv, also some experiments with ZBMath. It’s a huge index, but generally not online / not a production deployment. Has a “more academic” status, well-known in the (modest) Math Information Retrieval community.
>
> [more discussion on the discussion of differences between strict and pragmatic]
>
> BM: I could see a few steps down the road deprecating pragmatic. We shouldn’t put up barriers between the various forms (ie. pragmatic, strict and semantic-attributes should be compatible).
>
> MS: the ability to map to something else is what makes it so useful because math is not fixed. It makes it easier for accessibility because there is so much less.
>
>
> [not captured]
>
>
> DG: Scope depends on the application. For presentation MathML, it is if browsers render this, we are done. For speech and other applications, what determines what is done?
>
>
> [not captured]
>
>
> SD: people just care about appearance. Presentation works for them. Others want to specify semantics. There is a sweet spot where they want both for some common things. I think we should support all three communities.
>
> MS: I agree that content MathML has remained very academic. It was very far from being used.
>
>
> BM: I think it is a false dichotomy (between being able to express the simple cases simply and still handle more complex cases) that it is an all or nothing enterprise. There is a slice of openmath that is very generic. Coming up with a list that is short enough to find it easily and having a complete enough list. WIth OpenMath, you spend days looking for something and never finding the symbol…
>
> MS: Who will be creating MathML output. Most people use TeX to create math. That is then converted to MathML. They won’t write presentation MathML directly. If it is generated by a program, why is it easier to generate?
>
> SD: from experience, it is much easier to generate semantics than both contents and presentation.
>
> NS: it is locally generated for each construct, rather than globally generating two trees.
>
> PI: we are generating a new markup that might be easier to maintain, but it is not clear that the details are as complex as before. Sam did this and I’ll have to take his word that it is easier. I agree with Deyan that the target audience is important.
>
> DG: I claim that there are different representational decisions for different applications, and we have several for Content MathML. It is different for a11y on the web than for a notebook that wants to do computation. Similarly there are differences between inputting TeX syntax and using a palette-based input such as MathQuil. We should list the applications we are aiming to facilitate, and not make claims about applications we have not thought of yet.
>
> NS: I agree. This should go into the charter in success criteria.
>
> BM: We should focus on what we want to and not focus on things we aren’t interested in.
>
> SD: For example, I want to deal with online testing and capture enough semantics so I can score the result.
>
> BM: If it can do K12 and still be expandable, that’s a win-win.
>
> MS: We need three things: presentation, accessibility, and search.
>
> PI: That’s my list.
>
> DG: We need to discuss intended input methods (e.g. not by hand, yes by TeX, Office, palette widgets in javascript). We should specify that in the spec.
>
>
> Summary:
>
> Charter should mention goals of presentation, accessibility, and search
>
> Charter should mention that input needs to be supported (from at least TeX and WYSIWYG). Not decided is how many implementations mean that we succeeded.
>
>
> Mid-meeting note from DG: The big problem with adoption of Content MathML is connected to the academic approach to the spec, which wasn’t close to a wide community of practitioners. It was indeed closer to CAS systems. We know it wasn’t successful because no one is publishing new Content Dictionaries. Finding a way to avoid the block that is incurred by requiring CDs is crucial to future adoption of a content standard, in my opinion.
>
>

Received on Tuesday, 11 August 2020 02:05:10 UTC