Re: missing letter gcedil in isolat2 from David Carlisle on 2007-10-23 (www-math@w3.org from October 2007)

From: David Carlisle <davidc@nag.co.uk>
Date: Tue, 23 Oct 2007 10:24:44 +0100
To: Paul.Bijnens@xplanation.com
Cc: www-math@w3.org
Message-Id: <200710230924.l9N9OiSg014939@edinburgh.nag.co.uk>

Thanks you for reportung this.

> Seems like an error, because the uppercase G-cedilla, just as
> any other letter for Latvian *is* included.
> 
> Really strange that nobody noticed that error, since the file dates
> from 2003 

Actually further back from that, ISOLat2 dates from the original SGML
specififcation, ISO 8879, dated from 1986. As far as I can see, the ISO
spec still (despite a couple of amendments since then) only specifies
the upper case G with cedilla. (Before replying I checked various online
sources, and the Goldfarb's printed SGML handbook.)

This is the same problem as the mathematical characters, many of which
have no entity name, or inappropriate entity names. It is tempting to
"fix" this by just adding the entity, but many systems use catalogs or
other similar systems that mean that a reference to a latin2 entity file
is intercepted and a local (or internalised) file is used rather than
the specified dtdt file being read. In theory the exact form of the FPI in
the public identifier would uniquely identify a new variant and systems
would detect that, but theory and real life don't always agree.

If a document uses &gcedil; and the DTD that is used does not define
this, then it is not well formed and most likely the entire document
will be rejected with a fatal parse error. This is a rather bad default
behaviour so it's really safer in most cases to use the character
directly, or to use a numeric reference, &#x123; which will always work,
or use &cedil; but define it in the document's local subset, so you
don't rely on an updated latin2 file.


The entity files are (slowly) being updated for MathMl3 (and hopefully
a synchronised update of  ISO 9573) but my current thoughts are to keep
all the names in all the etity sets exactly as before (for the reasons
given above) but just correct the assignments to unicode where
appropriate. There is though the possobility of having a new "extra"
entity set of previously unnamed characters, and gcedil should clearly
(now you have pointed it out) be a candidate for inclusion in any such
set.

I'm sorry this is a rather unsatisfactory answer, what makes editing the
entity sets challenging is that the existing names/definitions are so
hard to justify by any rational principle, but it is also hard to
justify any change to a set of names that have been in use for
over 20 years, if there is any chance of any change breaking any
(unknown!) existing uses....


David



________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Tuesday, 23 October 2007 09:25:00 UTC