Large language models are zero-shot MathML translators from Deyan Ginev on 2022-05-11 (www-math@w3.org from May 2022)

From: Deyan Ginev <deyan.ginev@gmail.com>
Date: Wed, 11 May 2022 12:49:20 -0400
To: www-math@w3.org
Message-ID: <CANjPgh8Se6XWdTppvvk6S8b+1oacC_ENZWv7JQL3usgnTpjjFQ@mail.gmail.com>

Hi everyone,

I'm writing with a hopefully news-worthy update from the developing
world of Machine Learning NLP research.
OpenAI's GPT-3 model was made available for public access recently,
and you can access their playground at:
https://beta.openai.com/playground

Access requires disclosing an email and phone number and is otherwise
free for experimentation.

There are a rather wide variety of interesting demos that one can
enjoy experimenting with, but the ones I'll focus on here touch on
MathML.
It appears that GPT-3 is capable of (somewhat unreliable, but modestly
successful) bidirectionally translating between standard TeX math
syntax, MathML, OpenMath, English and other natural languages, as well
as CAS-realm systems such as sympy or Mathematica.
This is a "zero-shot" capability, which is jargon to express that the
model was not trained on this specific task at all. Instead, it
obtained the capability as a side-effect of a generic pretraining
procedure: modeling the distribution of tokens in a huge corpus of
documents.

Here are some examples. Model completions are highlighted in green.
I am linking to github-based PNG screenshots of my GPT-3 playground
session, to avoid large attachments in this email.
Please feel free to try the GPT-3 playground yourselves, as my
screenshots are cherry-picked.

Example 1: GPT-3 generating MathML and OpenMath for "the set of all
sets that do not contain themselves"
https://user-images.githubusercontent.com/348975/167893833-a3ef6a7d-1f0a-49df-b1fe-cd96fe85b263.png

Example 2: GPT-3 generating accessible text in English and Bulgarian
from source presentation MathML 3
https://user-images.githubusercontent.com/348975/167895795-1fd8d329-fc7d-4a9f-94a3-863f0ece4147.png

Example 3: GPT-3 generating accessible text in English and Bulgarian
from source presentation MathML 3 + Intent
https://user-images.githubusercontent.com/348975/167895075-b060949b-6111-4d09-8fe2-bb917c3e0f20.png

Note 1: For examples 2 and 3, I took the simple expression defining
the commutator in Ring theory, from wikipedia:
https://en.wikipedia.org/wiki/Commutator#Ring_theory
Note 2: my MathML markup is ill-formed in example 3, as I made a typo
writing in the intent values by hand. And yet the translation was
successful, as language models learn to be robust to noisy inputs.
Note 3: I can testify that the generated Bulgarian text in examples 2
and 3 is both nicely legible and matches the English.
Note 4: Notice that the "intent" value of "commutator" was picked up,
and was used to produce the correct narration in example 3.

Example 4: GPT-3 generating 4 markup language translations for the
quadratic formula written in TeX. This mostly only works on very short
inputs, as coherence is lost rather quickly.
https://user-images.githubusercontent.com/348975/167900211-39368819-7c20-46d5-b80f-0ec991d2b107.png

For now, the impressive part for this example is that the model can
generate 4 independent outputs which are in a sense "almost correct",
given that GPT-3 was never trained (i.e. fine-tuned) on this
particular task.

---

To end with some forward-facing personal commentary:
It is encouraging to see today's large neural models are capable of
picking up on the new "intent" attribute without having been trained
on inputs that included it.
By the stochastic nature of these models, the "to markup" direction is
rather fragile, and likely too unreliable today.
However, the "to language" direction, which can be a lot more
forgiving in certain applications, seems to produce consistently
healthy mathematical text from MathML trees.
And encouragingly, for a range of different natural languages.

Rule-based systems that consume or emit MathML will of course continue
to be a lot more precise and reliable for a while longer. But I think
some viable competition may be brewing for AT systems in particular,
especially for generating language.
And in cases where AT developers are overwhelmed - some viable *help*
may be on the horizon too.

Greetings,
Deyan

Received on Wednesday, 11 May 2022 16:50:03 UTC