- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Fri, 30 Jun 2000 12:35:17 -0600
- To: Murray Altheim <altheim@eng.sun.com>, <www-xml-schema-comments@w3.org>
Murray - In your review of Part I of the last-call draft of XML Schema, you commented among other things on the rules governing XML Schema validation and conformance. >Sec. 6.1 Layer 1: Summary of the schema-validation core > >Another instance of befuddlement. How can this be considered >acceptable? (hilighting mine): > >The obligation of a schema-aware processor as far as the >schema-validation core is concerned is to implement the definitions of >schema-valid given below in Schema Validation of Documents (§7.2) >. Neither the choice of element information item to be >schema-validated, nor which of three means of initiating validation >are used, is within the scope of this specification. ... >Sec. 7.9 Missing Sub-components > >I've tried three or four times to write up something about this >section. Because of my incomplete understanding of the rest of the >spec it's difficult to confidently summarize, but my reaction in >general is one of mild shock. I long for the days of 'draconian' error >handling, and can only attempt to imagine a Web where §7.9 becomes the >norm for XML processing. These comments have been included in the XML Schema last-call issues list [LCI] and assigned issue number LC-177 for tracking purposes. The XML Schema WG has discussed issue LC-177 this week, and I have been asked to reply to you, explaining the rationale for the rules as they exist. Our review has confirmed that the rules as they are specified do reflect the consensus of the WG. The rules, and our reasons for them, are as follows: A Within a document, the schemaLoc attribute can be used on any element to provide a suggestion for where to locate a (not 'the') schema for a particular namespace. (Rationale: there may be any number of documents with a claim to be the normative definition of a namespace: prose documentation in various languages, formal specifications in DTD, XML Schema, RDF Schema, or other syntax, and so on. There may be multiple formalizations of the same namespace -- HTML is a well known example. Some believe that proper support for content negotiation in serveers and clients would allow all of these resources to be retrievable from the URI which identifies the namespace, but content negotiation is currently implemented only imperfectly and incompletely by software and incompletely understood by the average user. For these and other reasons, it is not possible -- and in the view of some, not desirable -- to guarantee that when one dereferences a namespace name the result will be an XML Schema document. It is therefore useful to have a safety valve for cases where the namespace name cannot be dereferenced, or does not yield an XML Schema document when dereferenced.) B The schemaLoc attribute is, formally, a *hint*, not an instruction. It may be taken as a claim that a schema for the namespace in question may be found at the location indicated. The schema validator is not required to take the hint. The exact method by which a schema validator finds a schema is out of scope and system dependent. We expect schema validators to use mechanisms like command-line options and arguments, menus, environment variables, and any other user-interface mechanism implementors think their users will find helpful. (Rationale: if I am receiving data from you, either I trust you or I validate the data. If I don't trust your claim that the document is valid, how on earth can I be expected to trust your claim that the schema at a given URI is the one we agreed to validate against? I can't be. So I need to have the right to tell the schema processor, "I don't care what the other guy said is a good schema, the schema *I* trust for this namespace is right *here*." Since the authoritative word must come from the user, not the document, and since we don't want to interfere with user interface design, it would be a huge mistake to prescribe a particular approach to allowing the user to say where to find schemas. Obviously, a processor can provide a 'trust the schemaLoc' option which will work in many cases.) C The schemaLoc attribute also constitutes a claim that the relevant parts of the document conform to that schema for the namespace in question. (Rationale: there is a range of opinion about the degree to which claims about validity should be expressed, or expressible, in the document itself; the view expressed here is a compromise between a position which advocates that the document instance be interpreted as making somewhat stronger claims, and a position which advocates that all such claims be expressed outside the document itself and that the meaning of schemaLoc be limited to what is described above in item B. The claim that a document is valid vis-a-vis a given schema document for a particular namespace is logically distinct from a request to validate the document, or from a request that the particular schema document be used to validate the elements from that namespace in the document: whether the document is validated, and if so which schema documents are used, may vary from circumstance to circumstance.) D The presence of a schemaLoc attribute does *not* constitute a request for validation. (Rationale: there are many situations in which a document should be read, possibly by a processor which understands how to validate it, but does not need to be, or SHOULD NOT be, validated. A request for validation is a transaction between a user and a piece of software, or between two pieces of software. It is not a declarative fact about a document. It is best left to a user interface.) E If more than one schema location is suggested for a particular namespace, it is not an error, but no particular priority is assigned to the two. (Rationale: they are HINTS, right?) F A validation process may start at any element in the document and work down. (Rationale: Launching a validation process is taken to be a matter between a user and a piece of software, or between two pieces of software. It may sometimes be important to validate the entire document; sometimes only certain parts of the document need to be validated. Since the presence of a schemaLoc attribute does not constitute a request for validation (and its absence cannot be taken as a binding request *not* to validate), the user is free to select any point as the starting point. It may be expected that some schema validators will, by default, start at the top of the document. But it is important that they are not REQUIRED to do so.) G A validation process may work in strict mode, lax mode, or skip mode. In checking the schema-validity of the document, the processor must switch from mode to mode on the basis of the {process contents} property on the relevant schema component. (Rationale: For some applications, it's essential to check every element and every attribute, and to insist that they be declared, roughly as in a DTD. This is strict mode. For some applications (black-box applications), it's essential to be able to specify that the schema applies only to some outer envelope, which contains well-formed XML as a payload, and that the payload does not need to conform to the schema and should be skipped entirely. Think of defining an information retrieval protocol like Z39.50 as a set of XML messages going back and forth. The envelope needs to conform to the schema, but the payload does not need to conform, and it would normally be a waste of cycles to try to validate the payload. This is skip mode. For some applications (white box applications), there may be a payload which need not be validated, and the elements in it need not be declared, but if elements are encountered for which declarations *are* available, they should be validated. In a template in an XSL stylesheet, for example, I may not care about validating the elements in the target namespace. (In fact, it is highly unlikely that I *can* validate them without writing a specialized schema for them: the target schema is unlikely to allow <xsl:value-lf> elements in the right places.) But if I see another XSL element inside a target element, I probably do want to validate it. This is 'lax' mode (known informally as 'opportunistic validation'). So strict, skip, and lax are each necessary, because each describes a plausible approach to validation and to coexistence of schemas and namespaces.) H In checking schema validity, a validation process must be guided by the {process contents} property on the relevant schema components, but it NEED NOT restrict itself to checking schema-validity only. For example, a processor may offer an option to check all elements strictly, even if the schema only requires lax processing. (Rationale: the schema may have been devised for skip-processing, but for my purposes I may insist on lax or strict processing. My business partners may not care about the contents of the payload, but for my purposes I want to know that if the payload contains anything that claims to be a purchase order, then it jolly well conforms to my schema for purchase orders.) I If in the schema the relevant {process contents} property has the value 'strict' or 'lax' or 'skip', this may be interpreted as a declarative statement that documents which conform to this schema must have no errors when processed in the specified mode. It follows that if a schema processor processes a black-box payload (declared with processContents='skip') in lax mode, and finds an error, the error in question is not a schema-validity error. (Rationale: all schema processors should give the same results, as regards schema validity. If the schema says something should be skip-conformant, you do have the right to check it in strict or lax mode, but you and your processor do not have the right to call failure to conform to the rules of strict or lax mode a schema validity error. Put in other terms: you can define your *own* validation property, say [strict validity], and get your processor to compute it, but you can't produce a PSV Infoset that records strict validity in the [validity] property -- the XML Schema spec defines what that property means, and you can't change that. As long as the processor distinguishes between failure to conform with the restrictions laid out in the schema, and other failures, all is well. You might also want a processor to check to make sure the document is in ASCII, not UTF-8 or UTF-16. That's your right, and it's OK. But the processor is not allowed to claim that a UTF-16 document is ill formed on that account.) I believe that you were mostly surprised and unhappy over rules B and F; I have included the others partly because I think they help make the picture more complete, and partly because some of them are becoming hobbyhorses of mine. I hope this description explains both why the rules are as they are, and why the WG does not feel they should be changed in response to your desire for stricter rules. The strict behavior you wish can be achieved: the user merely needs to specify that the entire document must be validating using strict validation. Requiring that all documents be validated in their entirety, and in the same strict mode, would replicate the shortcomings of DTDs for describing extensible markup languages. Please let me know whether this sufficiently addresses your concerns about the conformance rules of XML Schema. best regards, Michael Sperberg-McQueen -- **************************************************** * C. M. Sperberg-McQueen * * Research Staff, World Wide Web Consortium * * Route 1, Box 380A, Española NM 87532-9765 * * (that's Espanola with an n-tilde) * * cmsmcq@acm.org, fax: +1 (505) 747-1424 * ****************************************************
Received on Friday, 30 June 2000 15:02:41 UTC