- From: David Carlisle <davidc@nag.co.uk>
- Date: Fri, 1 Nov 2002 10:05:22 GMT
- To: pgrosso@arbortext.com
- CC: www-math@w3.org
Paul, Thanks for the pointer, I replied to this document on xml-dev, I include a copy below. I should probably note here that this is a personal response not a Working Group one (although obviously my personal views are somewhat flavoured by working on the MathML DTD and also my failure to persuade any end users that they would rather write → than →). David Date: 1 Nov 2002 09:22:22 +0000 From: David Carlisle <davidc@nag.co.uk> CC: xml-dev@lists.xml.org In-reply-to: <200210302155.QAA19553@mail2.reutershealth.com> (message from John Cowan on Wed, 30 Oct 2002 16:41:46 -0500 (EST)) Subject: Re: [xml-dev] Character Entities: An XML Core WG View Comments on "Character Entities: An XML Core WG View" While the comments on character entities are (mostly) technically correct they fail to acknowledge the real problems that authors currently face when trying to use character entities alongside other aspects of current XML technologies. The general flavour is "existing mechanisms suffice" but that may be compared to comments that "XML isn't needed as SGML could do everything needed in that area". Technically it is true but misses the point. XML has proved to have many advantages over SGML. Reasonable schema languages using XML instance syntax may prove to have real advantages over DTD, and it may yet prove to be the case that a different approach to entity definition may have real advantages. Unfortunately the document as published does not address the issues at all and just states the obviously true fact that entities already have a definition mechanism in DTD. Acknowledging the usability problems with the current mechanisms and investigating the possibilities for alternatives should not commit anyone to adding any mechanism in a future XML 2 (if there were ever to be such a version). As already seen in XML 1.1 debates, the costs of any version increment are high, and it may be a reasonable position to take that the changes required would be too great. However unless there is an acceptance of a desirability of new functionality and some rough idea about what changes could be made to meet that requirement then it is impossible to weigh the benefits against the costs of a version change. Responses to individual (quoted) points are contained below. > The existing mechanism, DTDs, is entirely adequate to the purpose. > Although some subsets of XML have outlawed DTDs in the name of > interoperability, all conforming XML processors (parsers) must be able > to recognize at least some DTD information, If you are using an XML application that forbids (at the application level) the use of <!DOCTYPE, the fact that this is allowed by the XML spec does not really help. SOAP is probably the main example of this, although probably SOAP is not so often used with hand authored documents. However given the pressure from some quarters to move from dtd to schema languages of one sort or another, this is likely to become more rather than less common. > At worst, then, the character entities actually used in a given > document (generally a small subset of those available) can be declared > in the internal subset, and are 100% interoperable across processors. As noted above, this facility may not be available at all. Even when it is, it is only barely usable for hand authored documents (which as you comment in the introduction is a main use case for entities of this form). The idea that every time you use a character by name you have to (a) know the required definition and (b) go up to the top of the document to add the entity declaration, has severe usability problems. > However, different XML applications such as XHTML and MathML do not > need to declare differently named entities for the same > characters. Most character names have already been standardized by > ISO, and these names should be and are used wherever possible. "most" characters have not had names standardised by ISO (or anyone else) unless you are thinking solely of characters used in common European languages. Also XHTML is incompatible with the usual ISO definitions (asymp and circ for example) which causes some problems for MathML which tries to be in agreeement with both. In addition Unicode/ISO chose not to support the full set of characters that have ISO entity names even in the additions in Unicode 3.x, so several so called "standard" names have wildy different definitions in common XML DTD, depending on whether the DTD author chose to pick something "close" or to mark the character as unsupported. Docbook for example maps several characters to #FFFD http://www.oasis-open.org/docbook/specs/wd-docbook-xmlcharent-0.3.html#d0e184 wheras MathML attempts to map all of these names to some more (or less) suitable character. At the very least, the W3C XML Activity could agree a standard set of entity names together with a mapping to Unicode for use across W3C specs (which may possibly just be a matter of rubber stamping the mathml ones http://www.w3.org/Math/characters and moving them out of the math area). There has been some interest expressed previously of coordinating such a set with ISO and/or OASIS. > There is no need for such a facility, because of the Unicode Private > Use Area (PUA). I would agree with this, but it is worth noting that W3C I18N group takes a very hard line against any public use of the PUA. David Carlisle _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Scanning Service. For further information visit http://www.star.net.uk/stats.asp or alternatively call Star Internet for details on the Virus Scanning Service.
Received on Friday, 1 November 2002 05:06:20 UTC