- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Fri, 30 Jun 2000 12:35:17 -0600
- To: Murray Altheim <altheim@eng.sun.com>, <www-xml-schema-comments@w3.org>
Murray -
In your review of Part I of the last-call draft of XML Schema, you
commented among other things on the rules governing XML Schema
validation and conformance.
>Sec. 6.1 Layer 1: Summary of the schema-validation core
>
>Another instance of befuddlement. How can this be considered
>acceptable? (hilighting mine):
>
>The obligation of a schema-aware processor as far as the
>schema-validation core is concerned is to implement the definitions of
>schema-valid given below in Schema Validation of Documents (§7.2)
>. Neither the choice of element information item to be
>schema-validated, nor which of three means of initiating validation
>are used, is within the scope of this specification.
...
>Sec. 7.9 Missing Sub-components
>
>I've tried three or four times to write up something about this
>section. Because of my incomplete understanding of the rest of the
>spec it's difficult to confidently summarize, but my reaction in
>general is one of mild shock. I long for the days of 'draconian' error
>handling, and can only attempt to imagine a Web where §7.9 becomes the
>norm for XML processing.
These comments have been included in the XML Schema last-call issues
list [LCI] and assigned issue number LC-177 for tracking purposes.
The XML Schema WG has discussed issue LC-177 this week, and I have
been asked to reply to you, explaining the rationale for the rules as
they exist. Our review has confirmed that the rules as they are
specified do reflect the consensus of the WG.
The rules, and our reasons for them, are as follows:
A Within a document, the schemaLoc attribute can be used on any
element to provide a suggestion for where to locate a (not 'the')
schema for a particular namespace.
(Rationale: there may be any number of documents with a claim to
be the normative definition of a namespace: prose documentation in
various languages, formal specifications in DTD, XML Schema, RDF
Schema, or other syntax, and so on. There may be multiple
formalizations of the same namespace -- HTML is a well known
example. Some believe that proper support for content negotiation
in serveers and clients would allow all of these resources to be
retrievable from the URI which identifies the namespace, but
content negotiation is currently implemented only imperfectly and
incompletely by software and incompletely understood by the
average user. For these and other reasons, it is not possible --
and in the view of some, not desirable -- to guarantee that when
one dereferences a namespace name the result will be an XML Schema
document. It is therefore useful to have a safety valve for cases
where the namespace name cannot be dereferenced, or does not yield
an XML Schema document when dereferenced.)
B The schemaLoc attribute is, formally, a *hint*, not an
instruction. It may be taken as a claim that a schema for the
namespace in question may be found at the location indicated. The
schema validator is not required to take the hint. The exact
method by which a schema validator finds a schema is out of scope
and system dependent. We expect schema validators to use
mechanisms like command-line options and arguments, menus,
environment variables, and any other user-interface mechanism
implementors think their users will find helpful.
(Rationale: if I am receiving data from you, either I trust you or
I validate the data. If I don't trust your claim that the
document is valid, how on earth can I be expected to trust your
claim that the schema at a given URI is the one we agreed to
validate against? I can't be. So I need to have the right to
tell the schema processor, "I don't care what the other guy said
is a good schema, the schema *I* trust for this namespace is right
*here*." Since the authoritative word must come from the user,
not the document, and since we don't want to interfere with user
interface design, it would be a huge mistake to prescribe a
particular approach to allowing the user to say where to find
schemas. Obviously, a processor can provide a 'trust the
schemaLoc' option which will work in many cases.)
C The schemaLoc attribute also constitutes a claim that the relevant
parts of the document conform to that schema for the namespace in
question.
(Rationale: there is a range of opinion about the degree to which
claims about validity should be expressed, or expressible, in the
document itself; the view expressed here is a compromise between a
position which advocates that the document instance be interpreted
as making somewhat stronger claims, and a position which advocates
that all such claims be expressed outside the document itself and
that the meaning of schemaLoc be limited to what is described
above in item B.
The claim that a document is valid vis-a-vis a given schema
document for a particular namespace is logically distinct from a
request to validate the document, or from a request that the
particular schema document be used to validate the elements from
that namespace in the document: whether the document is validated,
and if so which schema documents are used, may vary from
circumstance to circumstance.)
D The presence of a schemaLoc attribute does *not* constitute a
request for validation.
(Rationale: there are many situations in which a document should
be read, possibly by a processor which understands how to validate
it, but does not need to be, or SHOULD NOT be, validated. A
request for validation is a transaction between a user and a piece
of software, or between two pieces of software. It is not a
declarative fact about a document. It is best left to a user
interface.)
E If more than one schema location is suggested for a particular
namespace, it is not an error, but no particular priority is
assigned to the two.
(Rationale: they are HINTS, right?)
F A validation process may start at any element in the document and
work down.
(Rationale: Launching a validation process is taken to be a matter
between a user and a piece of software, or between two pieces of
software. It may sometimes be important to validate the entire
document; sometimes only certain parts of the document need to be
validated. Since the presence of a schemaLoc attribute does not
constitute a request for validation (and its absence cannot be
taken as a binding request *not* to validate), the user is free to
select any point as the starting point. It may be expected that
some schema validators will, by default, start at the top of the
document. But it is important that they are not REQUIRED to do
so.)
G A validation process may work in strict mode, lax mode, or skip
mode. In checking the schema-validity of the document, the
processor must switch from mode to mode on the basis of the
{process contents} property on the relevant schema component.
(Rationale: For some applications, it's essential to check every
element and every attribute, and to insist that they be declared,
roughly as in a DTD. This is strict mode.
For some applications (black-box applications), it's essential to
be able to specify that the schema applies only to some outer
envelope, which contains well-formed XML as a payload, and that
the payload does not need to conform to the schema and should be
skipped entirely. Think of defining an information retrieval
protocol like Z39.50 as a set of XML messages going back and
forth. The envelope needs to conform to the schema, but the
payload does not need to conform, and it would normally be a waste
of cycles to try to validate the payload. This is skip mode.
For some applications (white box applications), there may be a
payload which need not be validated, and the elements in it need
not be declared, but if elements are encountered for which
declarations *are* available, they should be validated. In a
template in an XSL stylesheet, for example, I may not care about
validating the elements in the target namespace. (In fact, it is
highly unlikely that I *can* validate them without writing a
specialized schema for them: the target schema is unlikely to
allow <xsl:value-lf> elements in the right places.) But if I see
another XSL element inside a target element, I probably do want to
validate it. This is 'lax' mode (known informally as
'opportunistic validation').
So strict, skip, and lax are each necessary, because each
describes a plausible approach to validation and to coexistence of
schemas and namespaces.)
H In checking schema validity, a validation process must be guided
by the {process contents} property on the relevant schema
components, but it NEED NOT restrict itself to checking
schema-validity only. For example, a processor may offer an
option to check all elements strictly, even if the schema only
requires lax processing.
(Rationale: the schema may have been devised for skip-processing,
but for my purposes I may insist on lax or strict processing. My
business partners may not care about the contents of the payload,
but for my purposes I want to know that if the payload contains
anything that claims to be a purchase order, then it jolly well
conforms to my schema for purchase orders.)
I If in the schema the relevant {process contents} property has the
value 'strict' or 'lax' or 'skip', this may be interpreted as a
declarative statement that documents which conform to this schema
must have no errors when processed in the specified mode. It
follows that if a schema processor processes a black-box payload
(declared with processContents='skip') in lax mode, and finds an
error, the error in question is not a schema-validity error.
(Rationale: all schema processors should give the same results, as
regards schema validity. If the schema says something should be
skip-conformant, you do have the right to check it in strict or
lax mode, but you and your processor do not have the right to call
failure to conform to the rules of strict or lax mode a schema
validity error. Put in other terms: you can define your *own*
validation property, say [strict validity], and get your processor
to compute it, but you can't produce a PSV Infoset that records
strict validity in the [validity] property -- the XML Schema spec
defines what that property means, and you can't change that.
As long as the processor distinguishes between failure to conform
with the restrictions laid out in the schema, and other failures,
all is well. You might also want a processor to check to make
sure the document is in ASCII, not UTF-8 or UTF-16. That's your
right, and it's OK. But the processor is not allowed to claim
that a UTF-16 document is ill formed on that account.)
I believe that you were mostly surprised and unhappy over rules B and
F; I have included the others partly because I think they help make
the picture more complete, and partly because some of them are
becoming hobbyhorses of mine.
I hope this description explains both why the rules are as they are,
and why the WG does not feel they should be changed in response to
your desire for stricter rules. The strict behavior you wish can be
achieved: the user merely needs to specify that the entire document
must be validating using strict validation. Requiring that all
documents be validated in their entirety, and in the same strict mode,
would replicate the shortcomings of DTDs for describing extensible
markup languages.
Please let me know whether this sufficiently addresses your concerns
about the conformance rules of XML Schema.
best regards,
Michael Sperberg-McQueen
--
****************************************************
* C. M. Sperberg-McQueen *
* Research Staff, World Wide Web Consortium *
* Route 1, Box 380A, Española NM 87532-9765 *
* (that's Espanola with an n-tilde) *
* cmsmcq@acm.org, fax: +1 (505) 747-1424 *
****************************************************
Received on Friday, 30 June 2000 15:02:41 UTC