W3C home > Mailing lists > Public > public-html@w3.org > April 2008

Re: several messages about New Vocabularies in text/html

From: David Carlisle <davidc@nag.co.uk>
Date: Sat, 5 Apr 2008 21:40:55 +0100
Message-Id: <200804052040.m35KetCN024085@edinburgh.nag.co.uk>
To: ian@hixie.ch
Cc: public-html@w3.org, www-math@w3.org

> Is there some permanent URI from which the absolute latest unicode.xml 
> file from which that document is created can always be found? (I don't 
> mind if it's not in w3.org space, in case you edit the document elsewhere 
> where the document would be more up to date, it's just a reliably up to 
> date URI that I'm looking for.)

The one linked to from the document at 
is (always) the latest version of the file (and of the stylesheets used
to extract information from other sources into that file, and from that
file into the document's tables and DTD entity declarations).

Like most (all?) of the W3C site it is under CVS control and the
public view on the web just always reflects the HEAD of the CVS
repository. I assume you have (or could have) W3C cvs access in which
case you could check it out from $CVSROOT/WWW/2003/entities/2007xml/
and see the cvs logs et if you wish, but the URI above is always the
latest version.

> I notice that there are entities even for many ASCII characters such as 
> the colon ":", is that really necessary?

Entities aren't really necessary:-) Really the only reason for
maintaining these entity definitions is to help transition legacy
documents, colon is defined in ISONUM that is, it's been around since the
original ISO 8879 standard defining SGML in 1986 if not before. I get
requests to drop certain characters and (more often) requests to add some
new names, but basically doing either causes interoperability problems as
fragments often move around without keeping their correct dtd reference.
the set of names (especially the ISO ones) are inconsistent, and
sometimes downright cryptic, but they are what they are and I don't plan
on changing any of them, just trying to keep a sane mapping from that
set of names to Unicode.

That said, I think it's vitally important the names are consistent with
(x)html (Many of the names would be mapped differently if it were not
for html compatibility) but it's less important that _all_ of them
go in to a larger html+mathml set.

The "combined" file that I referred to lists all the entities defined in
that set including some of the iso entities that are not included in
MathML, (ISOGRK1 and ISOGRK3 for textual Greek rather than mathematical
Greek usage, for example are not in the MathML dtd) see
      <group name="mathml">
in unicode.xml for a list of entity set mathml currently uses.


The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
Received on Saturday, 5 April 2008 20:41:30 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:32 UTC