- From: Paul Prescod <paul@prescod.net>
- Date: Mon, 10 May 1999 17:32:01 -0500
- To: www-xml-schema-comments@w3.org, xml-dev <xml-dev@ic.ac.uk>
XML Schema Part 1 seems to import a mistake from SGML and XML. This is the idea that content models must either be text-containing, "mixed" or element containing and that the former sort of model must not constrain the ordering of elements and text nodes. "A content model for mixed content provides for mixing elements with character data in document instances. The allowed element types are named, but neither their order or their number of occurrences are constrained." SGML had a separation between mixed and text-containing nodes but it did not have this constraint that it not be possible to constrain the order and occurence of text nodes and element nodes. #PCDATA was just a token and you could use it where you wanted. What it did have was a massive bug in its parsing algorithm that made these "constrained" mixed content models impossible to use. The bug had nothing to do with validation -- it was a parser problem. There sprung up a superstition that these mixed content models were evil when the truth is that the particular bug in SGML was the real problem. Before it was clear that we could change SGML, XML adopted a ridiculously confusing rule about the use of mixed content. It didn't occur to me (or probably anyone else) that it would have been better to just fix the bug. We probably didn't know at that point that we had that option. Now this rule has been imported into XML Schema. The rule is even more out of place in XML schema than it was in XML itself. Then we had the opportunity to fix the bug. Today the bug is not even relevant -- XML schema works on the result of the parse....it does not influence the parse. #PCDATA is just a data type that is unconstrained. You should be able to mix data type refs, #PCDATA and element type refs in content models with impunity (barring real parsing ambiguity). Using old syntax: <!ELEMENT SECTION (#PCDATA, P+)> <!ELEMENT FIG (#PCDATA|IMG)> <!ELEMENT HTML (TITLE,(#PCDATA|P)+)> You can handle any of these with wrappers but I claim that the instinct to wrap these things arises more from exposure to the superstition than from fundamental design considerations. We can make XSchema more uniform by removing the concept of "mixed content" and by introducing a PCDATA content token type. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco And so, in one of history's little ironies, the global triumph of bad software in the age of the PC was reversed by a surprising combination of forces: the social transformation initiated by the network, a long-discarded European theory of political economy, and a small band of programmers throughout the world mobilized by a single simple idea. - http://old.law.columbia.edu/my_pubs/anarchism.html
Received on Monday, 10 May 1999 19:33:00 UTC