Re: [MathML3-last-call] mathvariant from Sam Dooley on 2009-09-28 (www-math@w3.org from September 2009)

From: Sam Dooley <sam@integretechpub.com>
Date: Mon, 28 Sep 2009 11:04:41 -0600
To: Karl Tomlinson <w3@karlt.net>, www-math@w3.org
Cc: Jacques Distler <distler@golem.ph.utexas.edu>
Message-Id: <200909281717.n8SHH6dQ020324@parzival.integretechpub.com>
The intent of the mathvariant attribute is to provide a markup solution
to represent mathematical characters in a way that protects them from
accidental style changes.

There are three cases:

(1) Mathematical characters that have assigned code points in the SMP.

The mathvariant attribute provides a way to encode these characters
using only 16-bit data values.

So <mi mathvariant="bold">A</mi> means the same as <mi>&#x1D400;</mi>.

(2) Mathematical characters that have assigned code points in the BMP.

These are the "holes" in the alphabets in the SMP, because they were
deemed equivalent to characters that were already in the BMP.

The mathvariant attribute provides an alternate way to encode these
characters, even though they really don't need it.  The exact list
of these characters is given in chapter 7.

So <mi mathvariant="script">h</mi> means the same as <mi>&#x211B;</mi>.

(3) Mathematical characters that have no assigned code point.

The mathvariant attribute provides a way to encode characters that
could not be encoded otherwise.

So "bold italic dotless i" does not have an assigned code point, but
makes perfect sense (to me at least) as a mathematical character.
A renderer should feel free to do the reasonable thing if it sees
<mi mathvariant="bold-italic">&#x131;</mi> and it has a font that
contains it.

A "sans-serif alpha" as a mathematical character is another example.
Each implementor should choose what combinations to support, but I
could make different choices than someone else.  We should be able
to agree on characters related to the alphabets in U+1D400-U+1D7FF,
as described above, but beyond that I would expect support to vary,
perhaps widely.


So I agree that the wording of the spec should be clarified
in a few places:

(1) The spec should be more clear about when the mathvariant value
identifies a distinct mathematical character, whether or not it has
its own Unicode code point.  Perhaps just to say that mathvariant
values can be used with Basic Latin (U+20-U+7E), Greek capital/
small/symbol, dotless i/j and digamma BMP characters to encode a
mathematical character.

(2) The spec should not say to ignore mathematical characters that
do not have an assigned code point, such as bold italic dotless i
or sans-serif alpha, but should warn that such characters may not
be widely supported by existing fonts.

(3) The spec should provide a definition of mathvariant that is not
dependent on a specific version of Unicode, other than to say that
if new Unicode characters are introduced in the future, they may be
considered to be equivalent to an existing mathvariant encoding for
the character.

(4) The spec should clarify that the intent is not to transform
character code points from the BMP into the SMP, but to provide
a markup solution to represent mathematical characters that may
or may not have a simpler representation as a Unicode code point.

(5) The spec should clarify that when the mathvariant attribute is
applied to a Unicode code point that already identifies a mathematical
character, that the mathvariant implied by the code point overrides
any external mathvariant value.

So <mi mathvariant="bold">&#x210E;</mi> should be equivalent to
<mi mathvariant="italic">h</mi>, as implied by U+210E, and if you
want the bold italic h, use <mi mathvariant="bold-italic">h</mi>
or (equivalently) <mi>&#x1D489;</mi>.

In a sense, the intended mapping could be described as transforming
from (BMP + SMP) into (mathvariant x BMP), where the reverse mapping
is not always defined, even for the alphabets of interest.  But I
don't mean to suggest that any specific implementation should turn
out to be better or worse than any other.

These opinions represent my current understanding of the spec, and
are almost certainly not shared by the working group.

Sam
Received on Monday, 28 September 2009 17:18:02 UTC