- From: Tim Bray <tbray@textuality.com>
- Date: Tue, 25 Mar 1997 14:13:22 -0800
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
[various folks are edging towards various arcane methods to solve the "problem" of the Big 5 predeclared entities]. I have *got* to disagree. The spec is totally crystalline on this. If you need a '<' in character data or attribute value, use '<'. What could be clearer? In implementing Lark, there is [after compilation] a state in the automaton that recognizes < - this causes a '<' to be put in the data for eventual delivery to the app. Idiotically simple. So on technical grounds there are no problems in XML. On cultural grounds, it is de facto the case that the use of & and < is nearly universal, which greatly promotes interoperability, and is a fact of life we should be glad of. If we no longer predeclare these, then my minimal XML parser has to learn how to read and interpret entity declarations. Kiss the DPH goodbye. If I want to process an XML doc with Full SGML, all I need to do is declare the entities. So maybe we need a rule saying that a declaration of these entities with any other value than those given by XML makes a document non-well-formed and thus non-XML. IN ALL XML DOCUMENTS, & SHOULD MEAN '&' AND NOTHING ELSE, EVER. IN ALL XML DOCUMENTS, < SHOULD MEAN '<' AND NOTHING ELSE, EVER. IN ALL XML DOCUMENTS, > SHOULD MEAN '>' AND NOTHING ELSE, EVER. IN ALL XML DOCUMENTS, " SHOULD MEAN '"' AND NOTHING ELSE, EVER. IN ALL XML DOCUMENTS, ' SHOULD MEAN "'" AND NOTHING ELSE, EVER. Can someone explain in simple terms what the problem is that is causing us to consider these measures that will greatly increase the difficulty of minimal XML parsing and the amount of explanation necessary in the spec? I think I've been paying attention, and I certainly haven't seen anything raised that comes close to being serious enough to justify these measures, which would simultaneously increase complexity and decrease interoperability. - Tim
Received on Tuesday, 25 March 1997 17:14:32 UTC