- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Thu, 05 Oct 2000 19:23:34 -0600
- To: altheim@eng.sun.com
- Cc: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Dear Murray: The W3C XML Schema Working Group has spent the last several months working through the comments received from the public on the last-call draft of the XML Schema specification. We thank you for the comments you made on our specification during our last-call comment period, and want to make sure you know that all comments received during the last-call comment period have been recorded in our last-call issues list (http://www.w3.org/2000/05/12-xmlschema-lcissues). Among other issues, you raised the point registered as issue LC-172, which suggests (at least implicitly) that we drop the ANY wildcards for child elements and attributes, as being too complex for a version 1.0 language. In the course of dealing with the last-call comments, the WG considered this issue and asked me to convey to you their thanks for the suggestion and their reasons for not accepting it. The case for eliminating wildcards lies in their complexity, and the simplification to the spec which would result from eliminating them. The case for retaining them is that without wildcards of some kind, XML Schemas are incapable of defining the kinds of languages many schema authors would like to be able to define, or modeling the kinds of extensibility exhibited by, say, existing HTML browsers. The ANY keyword of SGML and XML 1.0 does not allow elements to be defined which allow arbitrary blocks of well-formed XML as their content; it is thus impossible, in XML 1.0, to define a DTD for (say) a protocol-oriented envelope, which carries arbitrary XML as its payload. Our wildcards do that, and thus make it possible to have schemas with 'black-box' areas. This is one of the things Schema does which is clearly more expressive than DTDs, and it is essential to allow the 'X' of XML to be meaningful not only in cases where validation is foregone, but in cases where document types are formally defined and validated. The use of wildcards also makes it much easier to apply different schemas to the same document instance: a schema for (say) tables can say that within a cell, any non-table element (i.e. any element in a namespace other than the table namespace) can occur; this is in effect what the SGML Open table model is intended to allow, with the difference that the SGML Open DTD fragment is compelled to say this in a comment, and provide a parameter-entity hook which a hostile user can easily misuse and defeat. The XML Schema formulation allows the same basic idea to be expressed in the schema proper, which is (I submit to you) where it belongs. You ask about the interaction of the ANY wildcards with the various levels of schema validation (strict, skip, and lax). Let me try to summarize the situation, imagining for concreteness that we are talking about a 'cell' element in a table module whose content is zero or more ANY-other-namespace wildcards. - with STRICT validation, the ANY wildcard effectively allows any element outside the table namespace for which a declaration is provided. If the cell element has mixed content, this is almost the same as an SGML/XML ANY keyword. (Since we have excluded the elements of the table namespace, it's not quite the same; if we did not exclude them, it would be exactly the same.) - with SKIP validation, the ANY wildcard effectively says each child element within the table cell is a black box of well-formed XML, which is not to be looked at for validation purposes. This is not the way I would define a table module, myself, but if we know that tables never nest, then this behavior resembles that of some table processors I have heard about. The SKIP validation with ANY keywords is exactly suited, however, to the definition of document envelopes which can take anything as their payload, including another envelope. The processor for the top-level envelope should pay attention only to the top-level envelope, not to any nested envelopes; SKIP validation expresses this approach from the point of view of validation. (Why skip validation of the envelope contents? Perhaps I want to send you, in an envelope, an envelope which I know to be invalid, in order to ask you "Why is this envelope not valid?") - with LAX validation, the ANY wildcard effectively says "any element can go here -- but if the schema includes a declaration for that element, it should be validated in the normal way." This is what some people call 'opportunistic' validation; it might be used (for example) to check the structure of all the tables, including nested tables, in a document, and nothing else: construct a schema just with the table declarations, and validate. Existing systems take three approaches to extensibility of markup languages, and three approaches to handling unknown elements. (1) They can say "an unknown element is an error" -- that is, in effect, what STRICT validation specifies. (2) They can say "the start- and end-tags of an unknown element are skipped, and the content is processed in the normal way" -- that is, in effect, what LAX validation does. As the experience of the Web has shown, this approach to extensibility is just what is needed to allow peaceful coexistence of old software with certain kinds of extensions to the markup language. Effectively, the rule to ignore tags for unknown element types amounts to saying "treat undeclared elements as if declared with mixed content and zero or more ANY wildcards, and perform lax validation, i.e. validate any children for which you have declarations". As you know, this works with some but not all kinds of extensions to a markup language: sometimes what you need to say is "undeclared elements should be skipped in their entirety." (3) They can say "unknown elements are to be skipped" regardless of their contents -- that is what SKIP validation does. I have not discussed the ANY-ATTRIBUTE wildcard, but I believe you will see how it can be used. Since the desire to be able to declare a 'well-formedness slot' in a markup language is one of the most common requests for improvements on the capabilities of DTDs, I think the WG was right to design the ANY wildcards into the language, and I hope the discussion above helps persuade you that the WG did the right thing in retaining them despite your invitation to remove them. It would be helpful to us to know whether you are satisfied with the decision taken by the WG on this issue, or wish your dissent from the WG's decision to be recorded for consideration by the Director of the W3C. best, Michael
Received on Thursday, 5 October 2000 15:25:37 UTC