- From: Andreas Strotmann <Strotmann@rrz.uni-koeln.de>
- Date: Thu, 11 Nov 2004 11:40:18 +0100
- To: Youcef Rabah Rahal <y.rahal@gmail.com>
- CC: www-math@w3.org

Everybody, I may not know much about mathematical notations used in cultures that use arabic writing in particular, but I have made some comments on general issues of MathML internationalization before on this list. I would therefore like to make some general comments on this issue. First of all, as has been implicit in the discussion so far, it is necessary to separate the discussion of arabic script rendering of MathML into issues concerning Presentation MathML and those concerning Content MathML. As an example, transliterations (variable "a" -> some arabic letter) are a really bad idea for Presentation MathML, but they are a legitimate concern for rendering of Content MathML, since the latter is supposed to be transnational in character. In Presentation MathML, I would like to suggest that - as far as possible - MathML follow the basic idea that Unicode stipulates for its encodings of writing systems, namely, that a MathML Presentation stream describe the formula it represents roughly in the order in which it would normally be (hand-) written, and that any Presentation MathML fragment should use the actual Unicode characters that they would be written with (modulo the Unicode concept of mirroring, of course). Current implementations of MathML do not support this, which makes it hard to use MathML Presentation as-is today to write maths that includes arabic script, but as several people have implied, this should be considered a bug of implementations, and not a good model for MathML to follow. Note that this implies, in particular, that it would be a good idea when using MathML Presentation to write numbers, function and variable names, and other plain-text ingredients, by using the correct Unicode characters that are going to be rendered. (I suspect that this may require a Unicode adjustment to adapt to the requirement for mirrored 'arabic' digit characters, and that it may open the question whether a number is written low-endian or high-endian, or LTR vs RTL, in a given script or locale). Since MathML is an XML application, it clearly takes the xml:lang attribute which can probably help distinguish basic writing style differences for elementary through high-school maths, including whether a formula is written LTR or RTL as a whole, which script to use for digits, whether the digits need to mirror or not, what exactly the name for cos or gcd is (in the case of Content MathML), etc. However, I suspect - and this is a question to those who have experienced it for themselves - that a proper way to do MathML in university level texts in many cultures is to simply annotate it as xml:lang="en-us" within a text that is otherwise xml:lang="eg", for example. This way, there would be no ambiguity as to what xml:lang="eg" means for MathML: it's unambiguously the rendering style commonly used in primary and secondary education in the culture designated by xml:lang (and in many or most countries, it will be used in tertiary education, too). Evidence that this is the correct solution could include whether one writes the latin letters "gcd" and "lcm" for greatest common divisor and least common multiple in such university level text books, since these are clearly English terms. Still, especially for rendering Content MathML, a much more fine-tuned choice of locale would need to be made available to content writers and renderers, even if one only considers Latin-script based maths (do you render 'exists' as an inverted E, as a big V, or (especially in primary and secondary education) as plain text "there exists" or "es gibt"...)? It is possible that some of the requirements for arabic script rendering of MathML Presentation can only be modelled in this way because there is no sufficient consensus on ways of writing maths within a culture that can be designated via xml:lang, which is restricted to ISO language codes as values. Hope this helps, -- Andreas Strotmann

Received on Thursday, 11 November 2004 10:40:32 UTC