Re: missing letter gcedil in isolat2 from Henri Sivonen on 2008-01-07 (www-math@w3.org from January 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 7 Jan 2008 15:02:59 +0200
To: Paul Bijnens <Paul.Bijnens@xplanation.com>
Cc: www-math@w3.org
Message-Id: <FF0520A6-650A-4756-8234-E74FF5E98299@iki.fi>

On Dec 24, 2007, at 12:17, Paul Bijnens wrote:

> Just one small argument pro for adding the gcedil to the ISOlat2 list,
> Even though the above webpage is not some official standard, it is
> hosted by the Unicode Consortium.
> The contra argument, that "&gcedil;" was not there in the original  
> SGML
> list in 1986, is still valid of course.

I think there's a reason not to add any entities at all to any W3C DTD:

  * Browsers don't really load DTDs dynamically.
  * It isn't reasonable to expect browsers to load DTDs dynamically in  
the future. http://hsivonen.iki.fi/no-dtd/
  * The early decision to hard-wire certain public IDs to abridged  
DTDs in Gecko already causes grief to WebKit and Opera.   http://annevankesteren.nl/2007/12/xml-entities 
  In practice these DTDs will need to be treated as grandfathered but  
no more should be added.
  * Deploying a page that contains an entity and/or a public ID that  
Gecko's catalog hack doesn't know about causes the YSoD. (I'm not  
saying that this is good. I'm just saying that the legacy has shipped  
and is out there.)
  * MathML is already so complex that it isn't written by hand.  
Instead, it is generated by tools like TeX4ht, iTeX2MML, OpenOffice  
and Mathematica.
  * A character entity never expands the expressiveness of an XML  
language. You could always use UTF-8 directly. Leaking text input  
problems to the client side is bad design.

Therefore, I think it would be a mistake and Bad-for-the-Web if any WG  
of the W3C tried to push a DTD change or a new DTD for Web deployment.  
I find http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/ very  
alarming if the intent is to serve those entities over the wire.

Consider the alternatives:
  * Express a character in UTF-8: works or fails gracefully on the  
rendering layer.
OR
  * Express a character as an entity reference: works or fails  
catastrophically even in cases where the above option would have worked.

It should be obvious that the former makes sense and the latter does  
not.

Mnemonic character input should be between the author and his/her  
MathML converter. What goes over the wire to the browser should be  
unescaped UTF-8.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 7 January 2008 13:03:22 UTC