- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 7 Jan 2008 15:02:59 +0200
- To: Paul Bijnens <Paul.Bijnens@xplanation.com>
- Cc: www-math@w3.org
On Dec 24, 2007, at 12:17, Paul Bijnens wrote: > Just one small argument pro for adding the gcedil to the ISOlat2 list, > Even though the above webpage is not some official standard, it is > hosted by the Unicode Consortium. > The contra argument, that "&gcedil;" was not there in the original > SGML > list in 1986, is still valid of course. I think there's a reason not to add any entities at all to any W3C DTD: * Browsers don't really load DTDs dynamically. * It isn't reasonable to expect browsers to load DTDs dynamically in the future. http://hsivonen.iki.fi/no-dtd/ * The early decision to hard-wire certain public IDs to abridged DTDs in Gecko already causes grief to WebKit and Opera. http://annevankesteren.nl/2007/12/xml-entities In practice these DTDs will need to be treated as grandfathered but no more should be added. * Deploying a page that contains an entity and/or a public ID that Gecko's catalog hack doesn't know about causes the YSoD. (I'm not saying that this is good. I'm just saying that the legacy has shipped and is out there.) * MathML is already so complex that it isn't written by hand. Instead, it is generated by tools like TeX4ht, iTeX2MML, OpenOffice and Mathematica. * A character entity never expands the expressiveness of an XML language. You could always use UTF-8 directly. Leaking text input problems to the client side is bad design. Therefore, I think it would be a mistake and Bad-for-the-Web if any WG of the W3C tried to push a DTD change or a new DTD for Web deployment. I find http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/ very alarming if the intent is to serve those entities over the wire. Consider the alternatives: * Express a character in UTF-8: works or fails gracefully on the rendering layer. OR * Express a character as an entity reference: works or fails catastrophically even in cases where the above option would have worked. It should be obvious that the former makes sense and the latter does not. Mnemonic character input should be between the author and his/her MathML converter. What goes over the wire to the browser should be unescaped UTF-8. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 7 January 2008 13:03:22 UTC