Re: MathML in Arabic from Andreas Strotmann on 2004-11-11 (www-math@w3.org from November 2004)

From: Andreas Strotmann <Strotmann@rrz.uni-koeln.de>
Date: Thu, 11 Nov 2004 11:40:18 +0100
To: Youcef Rabah Rahal <y.rahal@gmail.com>
CC: www-math@w3.org
Message-ID: <41934192.1060800@rrz.uni-koeln.de>
Everybody,

I may not know much about mathematical notations used in cultures that 
use arabic writing in particular, but I have made some comments on 
general issues of MathML internationalization before on this list. I 
would therefore like to make some general comments on this issue.

First of all, as has been implicit in the discussion so far, it is 
necessary to separate the discussion of arabic script rendering of 
MathML into issues concerning Presentation MathML and those concerning 
Content MathML.  As an example, transliterations (variable "a" -> some 
arabic letter) are a really bad idea for Presentation MathML, but they 
are a legitimate concern for rendering of Content MathML, since the 
latter is supposed to be transnational in character.

In Presentation MathML, I would like to suggest that - as far as 
possible - MathML follow the basic idea that Unicode stipulates for its 
encodings of writing systems, namely, that a MathML Presentation stream 
describe the formula it represents roughly in the order in which it 
would normally be (hand-) written, and that any Presentation MathML 
fragment should use the actual Unicode characters that they would be 
written with (modulo the Unicode concept of mirroring, of course).  
Current implementations of MathML do not support this, which makes it 
hard to use MathML Presentation as-is today to write maths that includes 
arabic script, but as several people have implied, this should be 
considered a bug of implementations, and not a good model for MathML to 
follow.

Note that this implies, in particular, that it would be a good idea when 
using MathML Presentation to write numbers, function and variable names, 
and other plain-text ingredients, by using the correct Unicode 
characters that are going to be rendered.  (I suspect that this may 
require a Unicode adjustment to adapt to the requirement for mirrored 
'arabic' digit characters, and that it may open the question whether a 
number is written low-endian or high-endian, or LTR vs RTL, in a given 
script or locale).

Since MathML is an XML application, it clearly takes the xml:lang 
attribute which can probably help distinguish basic writing style 
differences for elementary through high-school maths, including whether 
a formula is written LTR or RTL as a whole, which script to use for 
digits, whether the digits need to mirror or not, what exactly the name 
for cos or gcd is (in the case of Content MathML), etc.

However,  I suspect - and this is a question to those who have 
experienced it for themselves - that a proper way to do MathML in 
university level texts in many cultures is to simply annotate it as 
xml:lang="en-us" within a text that is otherwise xml:lang="eg", for 
example.  This way, there would be no ambiguity as to what xml:lang="eg" 
means for MathML: it's unambiguously the rendering style commonly used 
in primary and secondary education in the culture designated by xml:lang 
(and in many or most countries, it will be used in tertiary education, 
too). Evidence that this is the correct solution could include whether 
one writes the latin letters "gcd" and "lcm" for greatest common divisor 
and least common multiple in such university level text books, since 
these are clearly English terms.

Still, especially for rendering Content MathML, a much more fine-tuned 
choice of locale would need to be made available to content writers and 
renderers, even if one only considers Latin-script based maths (do you 
render 'exists' as an inverted E, as a big V, or (especially in primary 
and secondary education) as plain text "there exists" or "es gibt"...)?  
It is possible that some of the requirements for arabic script rendering 
of MathML Presentation can only be modelled in this way because there is 
no sufficient consensus on ways of writing maths within a culture that 
can be designated via xml:lang, which is restricted to ISO language 
codes as values.

Hope this helps,

 -- Andreas Strotmann
Received on Thursday, 11 November 2004 10:40:32 UTC