Re: multi-lingual/cross-cultural maths using MathML? from Andreas Strotmann on 2005-04-29 (www-math@w3.org from April 2005)

From: Andreas Strotmann <Strotmann@rrz.uni-koeln.de>
Date: Fri, 29 Apr 2005 13:05:24 +0200
To: Paul Libbrecht <paul@activemath.org>
CC: www-math@w3.org
Message-ID: <427214F4.5050706@rrz.uni-koeln.de>
Paul Libbrecht wrote:
> 
> Le 26 avr. 05, à 14:53, Andreas Strotmann a écrit :
> 
>> Paul Libbrecht wrote:
>>
>>> Can you hint to what would be the internationalization of MathML 
>>> content ???
>>> I would have hoped it to be international, being expected to be 
>>> semantic...
>>
>>
>> Good point, Paul.  Actually, in the case of MathML Content, 
>> *localization* (that is, rendering depending on a language or culture 
>> or other context) is the main issue.
>> Ex.:  gcd vs. ggT vs. mcd vs. MCD  (in locales en/de/es/it)
> 
> It's only an issue of the presentation system, not, not at all!, of 
> MathML-content.

Well - I respectfully disagree.  Issues of localization of MathML 
Content are a problem that MathML Content needs to deal with. 
Currently, the only option available is using the semantics parallel 
markup, but that will frequently be overkill, and at worst simply not work.

> In some of the documents quoted there, I find an amount of places where 
> MathML-content would need to support language, for example, that a 
> content-symbol should support an xml:lang attribute. I really don't 
> understand this!

Let me try to explain then.  Consider the case where you publish a 
mathematical paper. It is written in a language (say, Persian) that uses 
an Arabic script, and it has formulas embedded.

Those formulas would optimally be Content MathML - and as you say, the 
purer the better.  However, it is necessary for the renderer to somehow 
render the formulas for use in a Persian language, Arabic script 
document, possibly using conventions specific to the country the author 
is in.  Currently, that means parallel markup, but I don't see why only 
American English should profit from pure Content Markup: what is the use 
of Internationalization if you don't have Localization?  (Note that it 
is reasonable for the author to determine the choice of rendering for 
content MathML in this case, and not the reader, although the reader may 
be given a choice to override the author.)

Thus, even a pure Content Markup formula in a web page needs to be 
annotatable with a language tag - or at least inherit that tag from the 
surrounding markup (which is the same thing, really, since xml:lang is 
an defined to be an inheritable attribute: in the case of an inherited 
xml:lang, the value of the attribute is merely implicit, but it is 
definitely available).

Anyway, since xml:lang is universal XML, it's the logical choice to use 
in this context - and for the renderer (such as the universal 
stylesheet) to respect.

(Come to think of it, this may be your main point: this particular 
attribute should always be inherited rather than specified in 
MathML-Content.  As I said, in a technical XML sense, both are 
equivalent, and I did not distinguish them.)
> 
> That a presentation template (i.e. a recipe to produce presentation from 
> content) bears something such is clear but not the MathML-content itself !

The problem is that presentation templates are very limited in what they 
can use, and as soon as you render to natural language (e.g. "for all 
<something> we know <something else>" to render the universal quantifier 
- a very common occurrence in text books MathML Content is targetted at 
- i.e. K-12) defining a template that produces correct natural language 
is already next to impossible in English, but completely out of the 
question for languages with complex morphologies or other "interesting" 
linguistic phenomena.

Besides, I don't see that you can specify multi-lingual presentation 
templates in MathML that are applied depending on xml:lang values, 
although that may well be an extension for MathML to request for version 
3.0.
> 
>> However, in our paper we note that there are more parameters than just 
>> language that determine the correct choice of a specific rendering 
>> during localization of Content MathML.
>> Among these, the most important parameter in our application is 
>> probably the choice between rendering an expression as a formula (e.g. 
>> "Vx.P(x)") or rendering it as natural-language text (e.g. "there is an 
>> x such that P(x)").  The latter is not only important as a source for 
>> aural rendering; many of the more "advanced" features of mathematics 
>> are simply expressed verbally in lower grades, long before they are 
>> formalized for advanced students - quantifiers being a case in point.
> 
> 
> Your points are valid and it goas as fine-grained as the classroom or 
> course...
> Actually, if the rendering engine is used by an autonomous learner (e.g. 
> a researcher), there's no reason customization of the notation is not 
> offered for him as well.

Very true.  I think we make the point somewhere that different classes 
of choice of rendering are best left to different players (author, 
teacher, student) in our concrete application.
> 
>> It is in this context that MathML 3.0 will hopefully have better 
>> support for marking choices between variants.
> 
> 
> But where, oh where, in MathML-content or in OpenMath would you like to 
> put such choices ??

That, of course, is a question that needs discussing in the entire group 
responsible for developing MathML 3, for example.  At WebALT, we will 
certainly come up with specific recommendations both for MathML and for 
OpenMath eventually.

However, it is probably safe to say already that our recommendation is 
going to be for a small set of extra attributes with a small set of 
predefined values for use as rendering hints when localizing MathML 
Content elements. One such attribute/value combination that WebALT will 
need would say "express this part in natural language, not as a 
formula", while another would say the opposite (i.e. use a formula, not 
natural language). The specific language to use, of course, would be 
found in the xml:lang tag, while the extra attributes might literally be 
XML attributes specific to MathML Content, or non-semantic attributions 
in OpenMath.

Needless to say, we would be keenly interested in any accounts of 
experience gained with this kind of approach in either OpenMath or 
MathML - or Maple or Mathematica, for that matter.

  -- Andreas

PS: I apologize again to those whose publications we did not find in the 
course of preparing for our paper.  I wish we had had more time for our 
research on that one.
Received on Friday, 29 April 2005 11:05:33 UTC