W3C home > Mailing lists > Public > public-html@w3.org > August 2008

Re: [author-guide] Character Entity References Chart

From: David Carlisle <davidc@nag.co.uk>
Date: Thu, 28 Aug 2008 14:37:39 +0100
Message-Id: <200808281337.m7SDbdRE024154@edinburgh.nag.co.uk>
To: public-html@w3.org



Just stumbled on this thread via the archives.

Interesting, we should probably compare notes at some point:-)

> The HTML is generated from this XML file, which includes a lot of the 
> character metadata.
> 
> http://www.w3.org/2003/entities/2007xml/unicode.xml

I've been gradually adding more and more of the UCD data to that file
over the years, and keep experimenting with different formats for
displaying the data in html. If something else needs adding, then let me
know. 

The charts at and around

http://www.w3.org/2003/entities/2007doc/000.html

which are of course generated from the same file,  have overlapping
functionality giving the unicode name on mouseover and in that case
linking to a separate file whenever there is a defined entity name.

Ideas for new functionality always welcome.

One problem I always have is finding good glyph images. I keep hoping
stix will come out of beta which would fill some of my "gaps". It would
be nice (and easy) to be able to grab images from the unicode pdf charts
but unfortunately licence restrictions forbid that. Which fonts did you
use? My tables require 32x32 png of each character, without any other
text in the image, so I can't just steal your images:-)


    - U+200B is one of the worst with these 5 long names:
     &ZeroWidthSpace; &NegativeVeryThinSpace; &NegativeThinSpace;
     &NegativeMediumSpace; &NegativeThickSpace;


Nice isn't it:-) Although all the negative ones are deprecated and
shouldn't be used, so probably you can just lose those.

The negative space characters were part of the original STIX submission
to Unicode, and because of that were in MathML 1 (1998) and had entity
names assigned. However Unicode declined to add them, so they have been
deprecated in all subsequent versions of MathML, so getting on for 10
years now. As an undefined entity is a non recoverable fatal error in
XML I'm not prepared to take any entity names out of the MathML DTD, so
they need to be defined to something, and zero width space ended up
being that something.

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________
Received on Thursday, 28 August 2008 13:38:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:22 GMT