- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 28 Feb 2008 21:00:26 +0000
- To: www-xml-schema-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5164 ------- Comment #5 from cmsmcq@w3.org 2008-02-28 21:00 ------- [Again, speaking for myself, not the WG. I apologize in advance for the length of this comment.] Thank you, I think, for inducing me to look at this topic again more carefully. I confess that I had mostly regarded "schema-validity assessment" as merely a term we had invented during the inital work on XSD 1.0, in order to avoid using the term "validation", since "validation" seemed at the time to be tied tightly in people's minds with DTDs. (As I understood it, the new term also had the beneficial side effect of replacing a familiar four-syllable word with an unfamiliar nine-syllable phrase meaning essentially the same thing.) As time has gone by, it has become clear that the term "validation" is not now so tightly connected to DTDs as to be confusing when used in connection with other schema languages, and I (for one) have simply started saying "validation" instead of "schema-validity assessment" because it's shorter and clearer. At the face to face meeting in Florida last month, the WG declined to accept the proposition that "validation" and "schema-validity assessment" should be regarded (and described) as synonyms. Instead, the WG reaffirmed the view that (if I am reconstructing our thinking correctly) "validation" is to be narrowly construed as calculating the [validity] property, while "assessment" is to understood as the process which results in the full PSVI. The term "validation" may also possibly convey the idea that ONLY the [validity] property of the validation root is of interest, and that the [validity] of its descendants is calculated only in the service of coming up with the result for the root. I believe that this distinction is essentially the one your bug report urges the WG to take more seriously and use more consistently in our wording. For myself, I continue to have occasional difficulties with this distinction, since neither [validity] nor the full PSVI can be calculated without calculating the other -- the full PSVI includes the [validity] property, and there is nothing in the PSVI (unless I have forgotten something, in which case I'll fall back to saying "nothing MUCH in the PSVI") that does not play a role, however indirect, in the definition, and thus in the correct calculation, of the [validity] property of the root -- so [validity] cannot be calculated correctly without incidentally calculating the entire PSVI. Still, even if the two terms are extensionally equivalent, in that no one can perform validation without performing assessment, and vice versa, still they can be distinct in their intension / connotation, with one focusing on a simple ternary property and the other on a more elaborate information structure. (I shall try thinking of 'assessment' as a different way of saying 'annotation', and see whether that helps.) Having now spent a few hours looking at the spec and trying to align its usage with the distinction just outlined, I have begun to fear that making the spec consistent and clear on this matter doesn't look likely to be easy. The more I look at it, the less clear I am on (a) what distinctions section 2.1 is trying to draw, (b) what distinctions are actually made in the usage of the terms in the spec, and (c) what distinctions the spec SHOULD be drawing, in order to have useful terminology, and what its usage SHOULD be. Of these, (c) seems the most important, but any discussion of (c) is going to entail at least some clarification of, or bitter argument over, (a) and (b). I have begun to feel unsure, as a reader of the spec, whether the discussion in 2.1 is trying to define two distinct terms, or three. The two-term interpretation is the one I've tried to outline above: - validation = calculation of the [validity] property, more or less equivalent formally to what is done with DTDs and RelaxNG and other languages - schema-validity assessment (or "assessment" for short) = calculation of the full PSVI, thus a process which provides much more information than the Boolean or ternary value produced by validation In this interpretation, the text's association of "validation" specifically with LOCAL validity is slightly puzzling but assumed to be of not great consequence. (I doubt very much that the text actually uses 'validation' and related terms ONLY with regard to local validity -- the name of the [validity] property is one counter-example, to start with.) Ditto the odd inclusion of "schema-validity assessment" as one of three things included in the definition of "assessment"; if the one term is just a short form of the other, this looks like a circular definition. The three-term interpretation takes section 2.1 as trying to distinguish, and provide terms for, three distinct ideas: - validation = calculation of local validity (only); for XSD 1.1 we can say calculation of the [local validity] property - schema-validity assessment = calculation of the [validity] property. Recall that the [validity] of an item is a function of both the [local validity] of that item and the [validity] of its dependents. Note, then, that schema-validity assessment, so defined, entails validation, but not vice versa - assessment = validation + schema-validity assessment + infoset augmentation. This interpretation seems closer to what is actually said in 2.1 than the two-term interpretation, so for purposes of topic (a) I lean toward it. But the distinctions drawn and the terminology proposed seem problematic to me. (As a member of the WG that produced XSD 1.0, I am of course jointly responsible with others for what's in 2.1, but from where I now stand it looks as if I / we didn't do a very good job here.) Fist, since schema-validity assessment entails validation, it seems odd to list them both as if they were separable. Second, the augmentation of the input information set is a natural and unavoidable consequence of either of the first two, so listing augmentation as a separate item in the definition of 'assessment' also seems odd. The attitude toward "infoset augmentation" here looks, in fact, like a relic of the view (never openly acknowledged but pervasively smuggled into the text of XSD 1.0) that "the post-schema-validation infoset" is not a set of information automatically generated by validation / assessment, but a sort of API or data structure. We have done a lot to eliminate this error from the spec, but there is more work to be done in section 2.1, if we are to get the definitions of validation and assessment clear. An example may help make the point clearer. Consider an element, validated against a governing type definition in the course of validation / schema-validity assessment. In the "infoset as API" view, information like the identity of that governing type definition may or may not be part of the PSVI, depending on whether it is or is not present in the information presented by the validator to its invoker. Providing that information augments the set of information available to the caller. In the "infoset as set of information" view, the identity of the governing type definition is always and necessarily part of the PSVI, because it a piece of information always and necessarily present when the element is validated. Whether that part of the PSVI is exposed by the validator through an API or through messages or through a data structure or by other means is relevant to any description or use of the validator, but not to the definition of the PSVI. So I lean toward the view that while the current text of 2.1 is trying to define three distinct terms for three distinct concepts, both the distinctions between concepts and the choice of terms for the concepts are problematic at best. The central point from which this bug report started -- that the [validity] property on the root is often NOT what users of XSD will need to care about -- is a good one, as is the suggestion in comment #4 that we define IYFNTH or some other term for (a reasonable approximation of) what people trying to use XSD typically mean when they say "valid document". And so, for that matter, is the suggestion that if the XSD spec is going to claim to make a distinction between the terms "validation" and "assessment", the usage of the words should reflect that distinction. I seem once more to be uncertain (1) how to connect the the validation / assessment distinction) to the concept of IYFNTH, and (2) how best to define and use the terms in the text of the spec. I'll have to spend more time thinking about this bug. In the meantime, comments from John, or from anyone reading the Bugzilla entry, may be helpful.
Received on Thursday, 28 February 2008 21:00:42 UTC