Minutes: MathML Full WG, 2 Feb 2023 from Neil Soiffer on 2023-02-04 (www-math@w3.org from February 2023)

From: Neil Soiffer <soiffer@alum.mit.edu>
Date: Fri, 3 Feb 2023 21:05:57 -0800
To: "www-math@w3.org" <www-math@w3.org>
Message-ID: <CAESRWkBE_ercbfYnEPnBLA6zuo4N6BcJaiB2+Eu4PbG_UufxKA@mail.gmail.com>

Attendees:

- Neil Soiffer
- Louis Maher
- Patrick Ion
- David Farmer
- Steve Noble
- Bert Bos
- Dennis Müller
- Deyan Ginev
- Cary Supalo
- David Carlisle
- Paul Libbrecht
- Sam Dooley
- Murray Sargent
- Bruce Miller

<https://sandbox.cryptpad.info/code/inner.html?ver=5.2.2-0#cp-md-0-regrets>
Regrets
<https://sandbox.cryptpad.info/code/inner.html?ver=5.2.2-0#cp-md-0-agenda>
Agenda
<https://sandbox.cryptpad.info/code/inner.html?ver=5.2.2-0#cp-md-0-1-announcements-updates-progress-reports>1.
Announcements/Updates/Progress reports

NS: The Opera browser has picked up the chrome implementation.

SN: Pearson needs line breaking to enhance accessibility. It is not
supported in core.

NS: For things to move forward, either somebody needs to do the
implementation, or somebody needs to pay to do the implementation.

NS: Line breaking is on the table for core level 2.

SN: will let his management know these facts.

NS: There is a polyfill that can provide line breaking.

PL: On the email list that discusses media types, there is a professor who
is trying to register generic media types for elementary things, such as
numbers in operations.

PL: says there is no need for this.
<https://sandbox.cryptpad.info/code/inner.html?ver=5.2.2-0#cp-md-0-2-charter-discussion-a-walk-through-with-some-quot-live-quot-changes-10-minutes-max->2.
Charter discussion: a walk through with some "live" changes (10 minutes max)

NS: started reviewing "Other Deliverables"

SN, NS, and SD will work on MathML accessibility.

NS: removed search from the deliverables.

NS: discussed the item: A living catalog for annotations beyond those
defined in a MathML 4 recommendation. After a discussion, some of this
wording was changed. We want an open Catalog for adding new intents.

NS: next considered the item: Sample code for conversion of annotated
Presentation MathML to an external form such as speech and/or Content
MathML. People did not want to over promise on this issue.

DC: There are some cases where it's better not to put intent on, just let
the default just do the right thing. We do not want to commit ourselves to
put intent everywhere on everything.

NS: It seems like providing sample code is setting ourselves up for
something that we can't really do. We should just state our expectations
for defaults.

DG: This is the most difficult thing we have left to do. He wants to push
this off the charter list because this is just setting ourselves up for
something that we can't really do. This effort may require thousands of
rules.

DG: Let's push it off of the official charter list, and if we can do it, we
can always include it later as a bonus.

PL: We are saying that some examples will be delivered. We are not
promising completeness.

NS: The goal of writing down defaults is to say that this is the minimum
amount of interpretation that AT should be able to process.

DG: Drop it because if we gave some examples, then people would argue that
we did not choose the right examples.

*ACTION* PL: will look to see if we already have an issue on this. If we do
not have an issue on this, then PL will open one.
<https://sandbox.cryptpad.info/code/inner.html?ver=5.2.2-0#cp-md-0-3-continue-intent-discussions->3.
Continue intent discussions.
<https://sandbox.cryptpad.info/code/inner.html?ver=5.2.2-0#cp-md-0-a-a-href-https-github-com-w3c-mathml-issues-409-409-internationalization-a->a)
409 internationalization <https://github.com/w3c/mathml/issues/409>

-- Can anyone come up with a semi-complete list of known intents?

NS: MUS: shared is list with the group. Most of the things on his list were
Unicode values. They were not really a list of intents because the list
preceeds intent by many years. For the part that was potentially useful for
intents, it was not a complete list of intents. It had around 40 intents.

DF: is opposed to the TeX converter producing an international string.

We started a discussion about intent translations.

DF: If you have a person, reading a math document in Spanish, he wants to
hear his intent in Spanish also. How is this done. There are two ways. 1.
The initial creator of the document prepares the document, including
intents, in Spanish. 2. The author, creates the document in his own
language. When the document is translated, the intents are not translated.
When the reader accesses the document, the AT reads the standard Spanish
text, and the AT translates the intent into Spanish.

NS: did some google translate tests. The translator took about 0.1 seconds
per word to translate the document. NS: said that the reader of the
document expects his translations to be returned to his screen in around
0.1 seconds. For this reason, an on-the-fly translation of intents is not
practical, whereas the document author has all the time he needs to
translate the intent values.

DF: said that he recommends looking up the intent from a list of
pre-translated intents. We could develop a list of intents, and a local
language list of those intent translations could be provided. The AT could
then provide an intent translation in real time using word lookup.

NS: tell me all the words that need translating. NS is dubious that such a
general list could be made.

NS: The document author knows what needs to be translated. The author's
listed of translated intent words is small.

MUS: Take the common words, put them in English, and have lookup tables to
put them into the local language.

MUS: said that this would give you 99% of the words you need.

MUS: developed such a list eight years ago.

NS: We need a list of intent values for translation. The list might be
thousands of words long.

MUS: put up a list and people can add to it as necessary.

DG: did start on such a list based on Khan academy math classes.

PL: What we are talking about is a list of intent names and their
pronunciations.

NS: As an AT developer, I need to know apriori what the possible intent
values are in order to be able to build this table of translations.

NS: We are just talking about core and not the open list.

DG: Both pieces are important.

MUS: We need an extensible set. The core set should be translated ahead of
time. translating a single word may not use the math context. we should
translate the important terms using the math context. This local language
list could be given to the AT. This list should be open.

DG: Suppose I write in Bulgarian with Bulgarian intents, for a Bulgarian
audience. then I would use macros that tell the AT to use my Bulgarian
intents and not to translate them.

NS: This reminds me that knowing the language of the document and
overriding that via the lang attribute is something we need consider so
that if the language of the intent differs from the language of the
document, it is noted on via a lang attribute.

DF: It is our job to provide the long list and the translations.

NS: We cannot translate into many languages. but we do need to develop a
list.

DG: So, Facebook has this model with 200 languages. It does a nice job, I
tried 5 languages, and each of them were translated well. The translator
did not use tables.

DG: So both the table look-up procedure and the AI approach are important,
and you shouldn't predicate what methods going to get used, because I think
both symbolic and your own methods have interesting practical applications.

From Deyan Ginev to Everyone: https://huggingface.co/facebook/nllb-200-3.3B

PL: Thinks that our group could translate intents into six languages.

NS: would like to get the lists of intents before working out the details
of translating.

BM: Internationalization means we can deal with documents in multiple
languages. it does not mean we can automatically translate between
languages. what level are we aiming for?

BM: We have been considering a minimal dictionary for things that need
special treatment.

From Patrick D F Ion to Everyone: It seems to me that the WG can certainly
already do 6 or more languages from native speakers, and knows enough close
friends

MUS: This would argue for things not in the core list. Elementary things
must be on the list.

DC: We are over thinking this. Let us get a list of fifty in ten values and
set up the translation infrastructure to work with this list. Then we can
grow the list as needed.

DC: We have not decided what we want to do with the list of words. we are
not making progress.

PL: We have agreement that a list is desirable.

NS: I am afraid that if we come up with such a list, that it will be
woefully incomplete and therefore not usable.

DF: We need a list so that we can start deciding what we will do with it.

*ACTION* DF: I'll start making a short list so that we can maybe get to the
next step, and I'll put all my top 10 on it.

NS: Please gather up all your macros that you're using for semantics and
include those.

DF: Yes.

NS: So, I hope we've made some progress in that. At least some people are
going to come up with lists. Paul, you have the action item of checking on
defaults for intense, whether we have an item about that, and if not to
create an issue.

PL: I will send you an email on this.

Received on Saturday, 4 February 2023 05:06:20 UTC