Re: [XML-Entities] Question regarding some Unicode sequences from David Carlisle on 2014-06-13 (www-math@w3.org from June 2014)

From: David Carlisle <davidc@nag.co.uk>
Date: Fri, 13 Jun 2014 16:41:29 +0100
To: Shervin Afshar <shervinafshar@gmail.com>
CC: <www-math@w3.org>
Message-ID: <539B1BA9.3030107@nag.co.uk>
On 13/06/2014 15:52, Shervin Afshar wrote:
>
 > On Fri, Jun 13, 2014 at 1:49 AM, David Carlisle <davidc@nag.co.uk
 > <mailto:davidc@nag.co.uk>> wrote:
 >
 >
 > Also, a general question comes to mind; what is the policy of MathML
 > regarding character entities which are currently not encoded in UCD,
 > but might be in the future?
 >
 > Basically it's too late (the only ones missing now are essentially
 > the non combining multiple dot accents) the bulk of the math
 > characters added in Unicode 3.1 -6 were added due to the STIX
 > proposal aided and abetted by the needs of this entity set. (bold
 > digamma and dotless j in particular added specifically for this set),
 > but now even if Unicode did add a non combining triple dot accent I
 > don't think we could change tdot any more than we can change race.
 >
 >
 > Does this imply that no new entities can be added in the future ever?
 > For example I was thinking to propose addition of Kulkarni-Nomizu
 > Product[1] to MathML and later propose it to UTC as a NamedSequence
 > of U+22C0 (N-ARY LOGICAL AND) and U+20DD (COMBINING ENCLOSING
 > CIRCLE).
 >
 > [1]: https://en.wikipedia.org/wiki/Kulkarni%E2%80%93Nomizu_product
 >


I think it's highly unlikely we add any more. Only one name has been 
added since 1998
and that was asympeq  which was added as html gave a different 
definition to asymp
than every XML based entity set for Unicode (MathML1, Docbook, TEI, ....)

Entities are really fragile in XML as any fragment needs to have a 
<!DOCTYPE and so becomes
a complete document so you can not cut and paste fragments and keep them 
well formed
without some work adding and removing <!DOCTYPE.

In a pure XML world at least you point to a specific DTD so if you have 
an updated public DTD
  or a local DTD with extra definitions it all works, but in an HTML 
world the DTD is implicit (even if
there is apparently a DTD referenced after <!DOCTYPE html PUBLIC ...> 
the html parser does not read it.
That means getting a new entity agreed means getting it into the HTML 
spec and getting
every browser manufacturer and mobile phone and TV and ... to build in 
the new entity.

It's not impossible for that to happen but it's a lot easier just to use 
the character directly
or to use numeric references. They don't require software update (even 
for new characters
except font updates)  and don't have the problem that XML fragments 
using them are not
well formed.

This is why for example the entities for double struck (black board 
bold) Aopf only covers A-Z even
though Unicode has a-z and 0-9 in this style. The cost of adding names 
for the lower case letters
is simply too high.

There is a background plan to generalise the list of names for other 
uses, such as TeX macro names,
names in editor character menus, etc, there there are far less 
constraints and a standard set would
make sense but as Frédéric Wang observed on this list the other day, the 
TeX names part of unicode.xml
has suffered from a lack of attention and needs updating and fixing 
before we add new names.

So returning to your question, I think there is scope for having an 
extended set of
"standard short names for Unicode characters used in mathematical sciences"
but I suspect that the existing set of XML entity names is essentially 
frozen.

If a name was added to HTML it would be added here to keep in sync, so HTML
is the controlling factor now (and I have no control over that:-)

David
Received on Friday, 13 June 2014 15:42:06 UTC