Re: [author-guide] Character Entity References Chart

Date: Thu, 28 Aug 2008 14:37:39 +0100
Just stumbled on this thread via the archives.

Interesting, we should probably compare notes at some point:-)

> The HTML is generated from this XML file, which includes a lot of the 
> character metadata.
> http://www.w3.org/2003/entities/2007xml/unicode.xml

I've been gradually adding more and more of the UCD data to that file
over the years, and keep experimenting with different formats for
displaying the data in html. If something else needs adding, then let me

The charts at and around


which are of course generated from the same file,  have overlapping
functionality giving the unicode name on mouseover and in that case
linking to a separate file whenever there is a defined entity name.

Ideas for new functionality always welcome.

One problem I always have is finding good glyph images. I keep hoping
stix will come out of beta which would fill some of my "gaps". It would
be nice (and easy) to be able to grab images from the unicode pdf charts
but unfortunately licence restrictions forbid that. Which fonts did you
use? My tables require 32x32 png of each character, without any other
text in the image, so I can't just steal your images:-)

    - U+200B is one of the worst with these 5 long names:
     &ZeroWidthSpace; &NegativeVeryThinSpace; &NegativeThinSpace;
     &NegativeMediumSpace; &NegativeThickSpace;

Nice isn't it:-) Although all the negative ones are deprecated and
shouldn't be used, so probably you can just lose those.

The negative space characters were part of the original STIX submission
to Unicode, and because of that were in MathML 1 (1998) and had entity
names assigned. However Unicode declined to add them, so they have been
deprecated in all subsequent versions of MathML, so getting on for 10
years now. As an undefined entity is a non recoverable fatal error in
XML I'm not prepared to take any entity names out of the MathML DTD, so
they need to be defined to something, and zero width space ended up
being that something.


