- From: Richard Kaye <R.W.Kaye@bham.ac.uk>
- Date: Tue, 21 Aug 2007 18:32:23 +0100
- To: Luca Padovani <lpadovan@cs.unibo.it>
- Cc: www-math@w3.org
Dear All I had independently noticed this particular inconsistency in the MathML entities. The MathML DTDs provide a long list of character entities and it's the (few but some) errors here that make me concerned about their status. Suppose I write a MathML2.0 document, using these entities. In a few years time new unicode characters may be introduced, or the wrinkles in MathML DTDs may be ironed out, and these entities are changed. Then my document suddenly doesn't display correctly with the new version of MathML, and the required changes are highly non-obvious. This wouldn't be a good state of affairs, especially as the use of XML was supposed to protect the lifespan of documents. Of course, the MathML2.0 document must always be validated against MathML2.0 DTD (so presumably the errors there can never be corrected). But someone or something must be able to convert MathML2.0 to MathML9.0 or whatever it will be, and at the very least there entity names are a serious source of potential error. Secondly, I would like to use these names in a MathML-authoring application. Rather than re-typing everything, what source of data should I use? There are two versions of "unicode.xml" on the w3 web site at http://www.w3.org/Math/characters/unicode.xml http://www.w3.org/2003/entities/xml/unicode.xml (maybe others .. these two are not particularly well advertised) or should I derive my work from the DTD itself (more difficult to parse but possibly more authoritative)? (BTW I would also like in this table to know whether a character is "normally" used as an identifier, a number, an operator or something else. For some characters this is tricky. But for the vast majority it is quite easy to define this, but sadly it's not present in the data. Am I asking too much?) It would be nice to provide this data is some useful manner with MathML3.0. But actually I'm not even convinced I want MathML to provide a long list of character character entity names. These slow down the processing considerably and may not be such an advantage. (My documents all go through a preprocessor that removes them anyway :) But if names are there, users may be more familiar with LaTeX names, and there are alternatives, such as the (incompatible) list of names in unicode TR25 http://www.unicode.org/reports/tr25/ or the names in the stix tables. If names are a good thing, perhaps they could be selected as an option (with different sets of names to choose from depending on user's taste)? Richard On Tue, 2007-07-24 at 17:19 +0200, Luca Padovani wrote: > Hi all, > > It appears that the italic variants of dotless i and dotless j > (respectively U+0131 and U+0237) are not listed in the table > > http://www.w3.org/TR/2003/REC-MathML2-20031021/italic.html > > but such two letters are actually available in the Unicode charts > (respectively U+1D6A4 and U+1D6A5). Maybe such two entries should be > listed in the next MathML revision. > > --luca >
Received on Tuesday, 21 August 2007 17:32:50 UTC