Re: italic dotless letters from Richard Kaye on 2007-08-21 (www-math@w3.org from August 2007)

From: Richard Kaye <R.W.Kaye@bham.ac.uk>
Date: Tue, 21 Aug 2007 18:32:23 +0100
To: Luca Padovani <lpadovan@cs.unibo.it>
Cc: www-math@w3.org
Message-Id: <1187717543.24983.52.camel@mat140.bham.ac.uk>

Dear All

I had independently noticed this particular inconsistency
in the MathML entities.  The MathML DTDs provide a long
list of character entities and it's the (few but some) errors
here that make me concerned about their status.

Suppose I write a MathML2.0 document, using these entities.  In 
a few years time new unicode characters may be introduced,
or the wrinkles in MathML DTDs may be ironed out, and these
entities are changed.  Then my document suddenly doesn't display
correctly with the new version of MathML, and the required
changes are highly non-obvious.  This wouldn't be a good state
of affairs, especially as the use of XML was supposed to
protect the lifespan of documents.  

Of course, the MathML2.0 document must always be validated against 
MathML2.0 DTD (so presumably the errors there can never be
corrected).  But someone or something must be able to convert
MathML2.0 to MathML9.0 or whatever it will be, and at the very 
least there entity names are a serious source of potential error.

Secondly, I would like to use these names in a MathML-authoring
application.  Rather than re-typing everything, what source of
data should I use?  There are two versions of "unicode.xml" on
the w3 web site at 
 http://www.w3.org/Math/characters/unicode.xml
 http://www.w3.org/2003/entities/xml/unicode.xml
(maybe others .. these two are not particularly well 
advertised) or should I derive my work from the DTD itself 
(more difficult to parse but possibly more authoritative)?   
(BTW I would also like in this table to know whether
a character is "normally" used as an identifier, a number, 
an operator or something else.  For some characters this is
tricky.  But for the vast majority it is quite easy to define
this, but sadly it's not present in the data.  Am I asking 
too much?)  It would be nice to provide this data is some
useful manner with MathML3.0.

But actually I'm not even convinced I want MathML to provide a 
long list of character character entity names. These slow down 
the processing considerably and may not be such an advantage. 
(My documents all go through a preprocessor that removes them anyway :)
But if names are there, users may be more familiar with LaTeX 
names, and there are  alternatives, such as the (incompatible) 
list of names in unicode TR25  http://www.unicode.org/reports/tr25/ 
or the names in the stix tables.  If names are a good thing, 
perhaps they could be selected as an option (with different sets 
of names to choose from depending on user's taste)?

Richard

On Tue, 2007-07-24 at 17:19 +0200, Luca Padovani wrote:
> Hi all,
> 
> It appears that the italic variants of dotless i and dotless j
> (respectively U+0131 and U+0237) are not listed in the table
> 
>   http://www.w3.org/TR/2003/REC-MathML2-20031021/italic.html
> 
> but such two letters are actually available in the Unicode charts
> (respectively U+1D6A4 and U+1D6A5). Maybe such two entries should be
> listed in the next MathML revision.
> 
> --luca
>

Received on Tuesday, 21 August 2007 17:32:50 UTC