- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: 07 Mar 2004 19:36:35 +0000
- To: Mik Lernout <mik@futurestreet.org>
- Cc: Michael Kay <mhk@mhk.me.uk>, 'Lingzhi Zhang' <lzhang@cse.ogi.edu>, 'dev xmlschema' <xmlschema-dev@w3.org>
On Sun, 2004-03-07 at 12:10, Mik Lernout wrote: > I do agree with Stephen/Lingzhi Zhang here: in a normal use case of > XMLSchema you will want to confine validation to only one valid > root-element. I agree that this is a normal use case. It is perhaps important to point out, however, that the opposite is equally normal; historically there are certainly examples of document type definitions or schemas designed to allow multiple choices of root elements. (The Text Encoding Initiative, to name one concrete example, defines both a "TEI.2" element and a "teiCorpus.2" element. The XHTML Modularization specification defines numerous HTML modules which are intended to be independently usable.) In not identifying a single root element, a schema conforming to W3C XML Schema resembles not so much a context-free document grammar as the set of vocabularies and production rules which make up part of such a grammar. For what it's worth, this was a conscious design choice on the part of the working group. The analogy with document type definitions seemed more relevant than the analogy with context-free grammars defined as a tuple of terminal vocabulary, non-terminal vocabulary, start symbol, and set of production rules. > The only reason to register multiple global elements would > be to be able to use them when importing/including the schema, when > refering to the element from within the schema, ... There is a big > secuity / application integrity aspect that is touched here as well: it > is pretty typical for applications to use XMLSchema for validation and > it would be very easy to bypass this validation completetly by using a > root element that is also registered globally but not "intended" to be > used as a root element. If it is important to start with a specific root, it should be possible to invoke the schema processor in such a way as to specify the element declaration which must match the root element, as described in section 5.2 of part 1 of the schema spec. Security concerns are certainly one reason one might wish to invoke the processor in such a way. > Michael: I agree with you that a schema should be able to match more > than one instance document, but I do also believe that it should only be > able to match only one "type" of instance document. If you have a look > at the "Purchase Order Schema" in the primer spec, do you think it is > the intention of the schema writer to be able to validate > "<comment>abc</comment>" as a valid instance of this schema? Or that the > application that will validate this will be constructed to be able to > cope with this instance? Let us hope that the purchase-order application knows it's looking for a valid purchaseOrder element, not just any element valid against the schema. Formally, I'm not sure the XML Schema spec defines the term "X is a valid instance of schema Y"; to the extent that it is the Working Group which is responsible for the sample purchase order specification, I can say I don't think the WG has any problem with a document with root element of 'comment' which is valid against the schema. It won't be a very interesting document, but the schema can certainly be used to validate it. Perhaps the tutorial should mention the fact, in order to make people aware that they do need to check the type of the root element. > Maybe I am completely off-base/confused here and this kind of > "unpredictable behaviour" is intended by the creators of the spec, but > then don't we have a serious communication problem in how the spec is > being read by the people who are writing XMLSchema validators and > applications? If this would be the case it would seem for example > logically to be able to mark, when validating, the root-element you wish > to validate against like in: validator.validate(po.xsd, > 'purchaseOrder'). Why isn't this the case? I believe it is the case with at least some processors; to the extent that it's not the case with others, I suspect that not enough paying customers have made clear they want the capacity to specify any of the options outlined in section 5.2: where to start validation (doesn't need to be at the root), what element declaration to start with, what complex type definition to start with. Just my two cents, -C. M. Sperberg-McQueen World Wide Web Consortium / MIT CSAIL
Received on Sunday, 7 March 2004 21:38:24 UTC