[Bug 5164] validation vs assessment

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5164





------- Comment #5 from cmsmcq@w3.org  2008-02-28 21:00 -------
[Again, speaking for myself, not the WG.  I apologize in advance for
the length of this comment.]

Thank you, I think, for inducing me to look at this topic again more
carefully.

I confess that I had mostly regarded "schema-validity assessment" as
merely a term we had invented during the inital work on XSD 1.0, in
order to avoid using the term "validation", since "validation" seemed
at the time to be tied tightly in people's minds with DTDs.  (As I
understood it, the new term also had the beneficial side effect of
replacing a familiar four-syllable word with an unfamiliar
nine-syllable phrase meaning essentially the same thing.)  As time has
gone by, it has become clear that the term "validation" is not now so
tightly connected to DTDs as to be confusing when used in connection
with other schema languages, and I (for one) have simply started
saying "validation" instead of "schema-validity assessment" because
it's shorter and clearer.

At the face to face meeting in Florida last month, the WG declined to
accept the proposition that "validation" and "schema-validity
assessment" should be regarded (and described) as synonyms. Instead,
the WG reaffirmed the view that (if I am reconstructing our thinking
correctly) "validation" is to be narrowly construed as calculating the
[validity] property, while "assessment" is to understood as the
process which results in the full PSVI.  The term "validation" may
also possibly convey the idea that ONLY the [validity] property of the
validation root is of interest, and that the [validity] of its
descendants is calculated only in the service of coming up with the
result for the root.

I believe that this distinction is essentially the one your bug report
urges the WG to take more seriously and use more consistently in our
wording.

For myself, I continue to have occasional difficulties with this
distinction, since neither [validity] nor the full PSVI can be
calculated without calculating the other -- the full PSVI includes the
[validity] property, and there is nothing in the PSVI (unless I have
forgotten something, in which case I'll fall back to saying "nothing
MUCH in the PSVI") that does not play a role, however indirect, in the
definition, and thus in the correct calculation, of the [validity]
property of the root -- so [validity] cannot be calculated correctly
without incidentally calculating the entire PSVI.

Still, even if the two terms are extensionally equivalent, in that no
one can perform validation without performing assessment, and vice
versa, still they can be distinct in their intension / connotation,
with one focusing on a simple ternary property and the other on a more
elaborate information structure.  (I shall try thinking of
'assessment' as a different way of saying 'annotation', and see
whether that helps.)

Having now spent a few hours looking at the spec and trying to align
its usage with the distinction just outlined, I have begun to fear
that making the spec consistent and clear on this matter doesn't look
likely to be easy.  The more I look at it, the less clear I am on 

 (a) what distinctions section 2.1 is trying to draw, 
 (b) what distinctions are actually made in the usage of the terms 
     in the spec, and 
 (c) what distinctions the spec SHOULD be drawing, in order to have 
     useful terminology, and what its usage SHOULD be.  

Of these, (c) seems the most important, but any discussion of (c) is
going to entail at least some clarification of, or bitter argument
over, (a) and (b).

I have begun to feel unsure, as a reader of the spec, whether the
discussion in 2.1 is trying to define two distinct terms, or three.
The two-term interpretation is the one I've tried to outline above:

  - validation = calculation of the [validity] property, more or
    less equivalent formally to what is done with DTDs and RelaxNG and
    other languages

  - schema-validity assessment (or "assessment" for short) =
    calculation of the full PSVI, thus a process which provides much
    more information than the Boolean or ternary value produced by
    validation

In this interpretation, the text's association of "validation"
specifically with LOCAL validity is slightly puzzling but assumed to
be of not great consequence.  (I doubt very much that the text
actually uses 'validation' and related terms ONLY with regard to local
validity -- the name of the [validity] property is one
counter-example, to start with.)  Ditto the odd inclusion of
"schema-validity assessment" as one of three things included in the
definition of "assessment"; if the one term is just a short form of
the other, this looks like a circular definition.

The three-term interpretation takes section 2.1 as trying to
distinguish, and provide terms for, three distinct ideas:

  - validation = calculation of local validity (only); for XSD 1.1
    we can say calculation of the [local validity] property

  - schema-validity assessment = calculation of the [validity]
    property.  Recall that the [validity] of an item is a function of
    both the [local validity] of that item and the [validity] of its
    dependents. Note, then, that schema-validity assessment, so
    defined, entails validation, but not vice versa

  - assessment = validation + schema-validity assessment + infoset
    augmentation.  

This interpretation seems closer to what is actually said in 2.1 than
the two-term interpretation, so for purposes of topic (a) I lean
toward it.  But the distinctions drawn and the terminology proposed
seem problematic to me.  (As a member of the WG that produced XSD 1.0,
I am of course jointly responsible with others for what's in 2.1, but
from where I now stand it looks as if I / we didn't do a very good job
here.)

Fist, since schema-validity assessment entails validation, it seems
odd to list them both as if they were separable.

Second, the augmentation of the input information set is a natural and
unavoidable consequence of either of the first two, so listing
augmentation as a separate item in the definition of 'assessment' also
seems odd.  The attitude toward "infoset augmentation" here looks, in
fact, like a relic of the view (never openly acknowledged but
pervasively smuggled into the text of XSD 1.0) that "the
post-schema-validation infoset" is not a set of information
automatically generated by validation / assessment, but a sort of API
or data structure.  We have done a lot to eliminate this error from
the spec, but there is more work to be done in section 2.1, if we are
to get the definitions of validation and assessment clear.

An example may help make the point clearer.  Consider an element,
validated against a governing type definition in the course of
validation / schema-validity assessment.  In the "infoset as API"
view, information like the identity of that governing type definition
may or may not be part of the PSVI, depending on whether it is or is
not present in the information presented by the validator to its
invoker.  Providing that information augments the set of information
available to the caller.  In the "infoset as set of information" view,
the identity of the governing type definition is always and
necessarily part of the PSVI, because it a piece of information always
and necessarily present when the element is validated.  Whether that
part of the PSVI is exposed by the validator through an API or through
messages or through a data structure or by other means is relevant to
any description or use of the validator, but not to the definition of
the PSVI.

So I lean toward the view that while the current text of 2.1 is trying
to define three distinct terms for three distinct concepts, both the
distinctions between concepts and the choice of terms for the concepts
are problematic at best.

The central point from which this bug report started -- that the
[validity] property on the root is often NOT what users of XSD will
need to care about -- is a good one, as is the suggestion in comment
#4 that we define IYFNTH or some other term for (a reasonable
approximation of) what people trying to use XSD typically mean when
they say "valid document". And so, for that matter, is the suggestion
that if the XSD spec is going to claim to make a distinction between
the terms "validation" and "assessment", the usage of the words should
reflect that distinction.

I seem once more to be uncertain (1) how to connect the the validation
/ assessment distinction) to the concept of IYFNTH, and (2) how best
to define and use the terms in the text of the spec.

I'll have to spend more time thinking about this bug.  In the
meantime, comments from John, or from anyone reading the Bugzilla
entry, may be helpful.

Received on Thursday, 28 February 2008 21:00:42 UTC