- From: David Carlisle <davidc@nag.co.uk>
- Date: Fri, 13 Jun 2014 16:41:29 +0100
- To: Shervin Afshar <shervinafshar@gmail.com>
- CC: <www-math@w3.org>
- Message-ID: <539B1BA9.3030107@nag.co.uk>
On 13/06/2014 15:52, Shervin Afshar wrote: > > On Fri, Jun 13, 2014 at 1:49 AM, David Carlisle <davidc@nag.co.uk > <mailto:davidc@nag.co.uk>> wrote: > > > Also, a general question comes to mind; what is the policy of MathML > regarding character entities which are currently not encoded in UCD, > but might be in the future? > > Basically it's too late (the only ones missing now are essentially > the non combining multiple dot accents) the bulk of the math > characters added in Unicode 3.1 -6 were added due to the STIX > proposal aided and abetted by the needs of this entity set. (bold > digamma and dotless j in particular added specifically for this set), > but now even if Unicode did add a non combining triple dot accent I > don't think we could change tdot any more than we can change race. > > > Does this imply that no new entities can be added in the future ever? > For example I was thinking to propose addition of Kulkarni-Nomizu > Product[1] to MathML and later propose it to UTC as a NamedSequence > of U+22C0 (N-ARY LOGICAL AND) and U+20DD (COMBINING ENCLOSING > CIRCLE). > > [1]: https://en.wikipedia.org/wiki/Kulkarni%E2%80%93Nomizu_product > I think it's highly unlikely we add any more. Only one name has been added since 1998 and that was asympeq which was added as html gave a different definition to asymp than every XML based entity set for Unicode (MathML1, Docbook, TEI, ....) Entities are really fragile in XML as any fragment needs to have a <!DOCTYPE and so becomes a complete document so you can not cut and paste fragments and keep them well formed without some work adding and removing <!DOCTYPE. In a pure XML world at least you point to a specific DTD so if you have an updated public DTD or a local DTD with extra definitions it all works, but in an HTML world the DTD is implicit (even if there is apparently a DTD referenced after <!DOCTYPE html PUBLIC ...> the html parser does not read it. That means getting a new entity agreed means getting it into the HTML spec and getting every browser manufacturer and mobile phone and TV and ... to build in the new entity. It's not impossible for that to happen but it's a lot easier just to use the character directly or to use numeric references. They don't require software update (even for new characters except font updates) and don't have the problem that XML fragments using them are not well formed. This is why for example the entities for double struck (black board bold) Aopf only covers A-Z even though Unicode has a-z and 0-9 in this style. The cost of adding names for the lower case letters is simply too high. There is a background plan to generalise the list of names for other uses, such as TeX macro names, names in editor character menus, etc, there there are far less constraints and a standard set would make sense but as Frédéric Wang observed on this list the other day, the TeX names part of unicode.xml has suffered from a lack of attention and needs updating and fixing before we add new names. So returning to your question, I think there is scope for having an extended set of "standard short names for Unicode characters used in mathematical sciences" but I suspect that the existing set of XML entity names is essentially frozen. If a name was added to HTML it would be added here to keep in sync, so HTML is the controlling factor now (and I have no control over that:-) David
Received on Friday, 13 June 2014 15:42:06 UTC