Re: Schema validation terminology

Michael Anderson <michael@research.canon.com.au> writes:

> I'm trying to get my head around how a schema can be "valid" and would
> like to start with the terminology.  Here is how I think it is, bearing
> in mind this is in regard to the _Schema_.

The spec. doesn't talk about schemas being valid.  'valid' is a term
defined in XML 1.0, it relates to a DTD, and should be avoided in
discussions of XML Schema.

Let's be really careful about terminology: schema _documents_ are
distinct from schemas.  There are three things the spec. requires:

  1) Any schema documents involved must be schema-valid wrt the schema
     for schemas;
  2) They must satisfy the additional Schema Representation
     Constraints;
  3) The schema they correspond to must satisfy the Constraints on
     Schemas.

Failure to satisfy these conditions is an error, and processing with
that schema should stop (except to provide additional feedback on
other errors).
 
> 1. Simple case first - If there is simply something wrong with the
> schema ( ie has minOccurs = "-3" ) then the schema is "invalid".

Not what the spec. says -- either the schema document is not
schema-valid (because 'minOccurs' is declared as nonNegativeInteger)
or the schema (e.g. if constructed by hand) itself is in error,
because {min occurs} must be a non-negative integer.

> 2. If the schema has nothing wrong with it and all definitions and
> declarations (I'll call these components) _do_not_ reference any other
> component then the schema is "valid"

Again, not what the spec. says.  Please don't make up your own
terminology which conflicts with existing usage.  You are right that
having a term to describe schemas which have no unresolved references
would be useful -- how about "complete"?

> 3. If the schema has nothing wrong with it, but it contains components
> that _do_ reference other components then there are three
> possibilities.  For the three possibilites consider the declaration:
>     <element ref = "food:WeetBix" />
>     3.1 The WeetBix element in the "food" namespace _can_ be resolved
> and there is nothing wrong with this WeetBix element - Then the schema
> is still "valid"
>     3.2 The WeetBix element in the "food" namespace _can_ be resolved
> but there is something wrong with it - Then the schema is now "invalid"
>     3.3 The WeetBix element in the "food" namespace _can_not_ be
> resolved. - Then the schema is now "partial".

I like "partial" -- it goes with "complete".  I think the distinction
between (1) and (2) is misleading at best, because references are not
resolvable/non-resolvable in the abstract: in any given validation
episode there is some schema == some set of components, probably
derived from one or more schema documents, and either those components
have unresolved references or they don't.

Note that just because a schema is partial, in our new terminology,
doesn't mean you can't legitimately use it for schema-based
processing.  Just as for DTDs, if the gaps aren't hit during
processing, no problem arises.

> Is this right? Are there three levels of validity for a  _Schema_?
> Invalid, Partial and Valid?

I'd say there are two separate issues:

  1) Is this a schema, or not?  If any of my (1) -- (3) above fail, you
  haven't _got_ a schema, just something the falsely purports to be
  one.  I _strongly_ resist using the word 'invalid' for this, for the
  reasons given above;

  2) Is this schema complete or partial?

Hope this helps.

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2001, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/

Received on Thursday, 7 December 2000 03:44:36 UTC