Re: [XML Schema 1.1] Does defaultOpenContent allow me to add extension elements before and after the root element?

On 4 Jun 2009, at 13:49 , Costello, Roger L. wrote:


 > Hi Folks,

 > Consider this schema, which uses <defaultOpenContent> to make the
 > entire schema open:

[... example snipped ...]


 > Can I add extension elements before and after the root element
 > (BookStore)?

Various answers to this are possible (including 'yes' and 'no' and
'maybe'); which one applies depends on aspects of the validation
episode you haven't specified.

 > Is this instance document legal (I have wrapped the root element
 > with an extension element):

[... example snipped ...]

I'm sorry if this sounds pedantic, but "legality" isn't a property or
term defined by the XSD spec.  To the extent that your question can be
paraphrased as "does this document abide by the agreement between
sender and receiver?", the answer is: it depends.  You haven't told us
what that agreement is.  We may assume that it involves the use of the
schema document you describe, and that it wants the [validity]
property of the validation root not to be 'invalid', but (a) neither
of those is necessarily the case and (b) by themselves they don't
suffice to determine an answer.

I think you mean "Does the presence of an xsd:defaultOpenContent
element in the schema document have any effect on the validity
conditions of the parent, if any, of r:BookStore?"  The answer is no.
The defaultOpenContent element in the schema document determines how
the types defined in this schema document behave.  It does not affect
types defined in other schema documents, and it does not affect
xsd:anyType.

Readers primarily interested in the effect of defaultOpenContent can
stop reading now.  The rest of this note is an explanation of why the
question "is this instance document legal" can almost never be
answered without more information than provided in this case, and
what factors may determine the answer.

....

Any conforming schema document defines some schema components, which
can be used in validation.  But a collection of components is not by
itself enough to determine the result of validation, any more than a
DTD file is by itself enough to determine the result of validation
using DTDs.

In order to validate the instance document you give, the person or
agent invoking the validator needs to specify:

   - What schema is to be used?  Is it the schema corresponding to
     the schema document you specified, with no additional components?
     Or might the schema document you specified be combined with
     another one which provides a definition for r:MyFavoriteBookStore?

     If the schema used for validation contains an element declaration
     for r:MyFavoriteBookStore, then whether that element is valid
     against that declaration or not depends on what that declaration
     says.  The open content specified in your schema document affects
     the type definitions given in your schema document, not others.

     Given a particular schema, whether the document is legal
     (satisfies the agreement between sender and recipient) or not
     depends on what validation assessment you request the validator to
     perform, and on details of the agreement.

   - Where does validation start? At the document root? At element
     //Date[2] ? somewhere else?  There are lots of possibilities.

   - Which validation mode is used to start?  Possible answers include

     . element-driven: the invoker specifies an element declaration
       in the schema and the validation root is validated against that
       declaration.

     . type-driven: the invoker specifies a type definition in the
       schema and the validation root is validated against that
       declaration.

     . lax wildcard validation: the invoker doesn't specify a
       declaration or definition.  Instead, the validator looks for a
       top-level element declaration (and possibly, if the validation
       root has an xsi:type attribute, for a top-level type
       definition), uses what it finds, and doesn't complain if it
       doesn't find one.

     . strict wildcard validation: like lax validation, but if no
       element declaration or type definition is found for the
       validation root, there's a problem.

Not all validators provide options for all of these possibilities; in
the extreme case, the invoker specifies a particular schema and a
particular mode for starting validation by using a particular
processor that only supports one way of constructing the schema (or
only one schema) and only one invocation mode.  If you want the choice
to lie with you and not with your software, examine the functionality
offered by your validator.

The XSD spec allows you a great deal of flexibility in defining what
classes of document you want to accept or reject, that is, in saying
what you want to be legal.  The flip side of that is that you are
responsible for saying what you mean.  If you want the instance to be
legal, you can certainly specify a schema-based agreement between
sender and recipient that makes it legal.  There are several ways to
achieve that result: which one you choose depends on *why* you want it
to be legal.  Similarly, if you want the instance to be illegal, you
can achieve that, too, and again your choice of methods depends on why
you want it to be illegal.

So, to illustrate what I said about the possible answers to your
question.

(1) Suppose the agreement between sender and receiver is that (a)
validation starts at the document's root element in (b) element-driven
mode, with the element declaration /schemaElement::r:Bookstore, and
(c) in the PSVI, the validation root should have [validity]=valid.

Result: not legal.  The r:MyFavoriteBookStore element doesn't match
the prescribed element declaration, so it isn't valid.

(2) Suppose the agreement between sender and receiver is that (a)
validation starts at the first r:Bookstore element in the document, in
(b) element-driven mode, with the element declaration
/schemaElement::r:Bookstore, and (c) in the PSVI, the result should
have [validity]=valid and [validation attempted] = partial or full.

Result: legal (I think; I just eyeballed the instance and schema and
didn't see anything wrong beyond the use of xsd:string for what is
apparently intended to be natural-language data, which is a poor
design choice but doesn't make the document invalid).

(3) Suppose the agreement between sender and receiver is that (a)
validation starts at the document root, in (b) type-driven mode, with
the type definition /schemaElement::r:Bookstore/type::*, and (c) in
the PSVI, the result should have [validity]=valid or
[validity]=unKnown and [validation attempted] = partial or full.

Result: not legal. The type requires at least one r:Book element, but
the only child is named r:BookStore.

(4) Suppose the agreement between sender and receiver is that (a)
validation starts at the document root, in (b) type-driven mode, with
the type definition /type::xsd:anyType, and (c) in the PSVI, the
result should have [validity]=valid or [validity]=unKnown and
[validation attempted] = partial or full.

Result: legal.

(5) Suppose we specify (a) strict wildcard mode, and (b) the resulting
PSVI has [validity]=valid on the validation root.

Result: not legal.  The schema you describe has no element declaration
for r:MyFavoriteBookStore, so the PSVI has [validity]=notKnown.

(6) Suppose we specify (a) strict wildcard mode, and (b) the resulting
PSVI has [validity]=valid on the validation root.

Result: not legal.  The schema you describe has no element declaration
for r:MyFavoriteBookStore, so the PSVI has [validity]=notKnown.

(7) Suppose we specify (a) strict wildcard mode, and (b) the resulting
PSVI has [validity]=valid or [validity]=notKnown (i.e. does NOT have
[validity]=invalid) on the validation root.

Result: legal.  The schema you describe has no element declaration for
r:MyFavoriteBookStore, so it's laxly assessed and the PSVI has
[validity]=notKnown.  Note that the document would be legal even if
you replaced the third r:Book element with

   <Book>This is not a legal book: it has character data where it
   shouldn't, and it lacks the required Title, Author, Date, ISBN, and
   Publisher children.</Book>

When the validation root is laxly assessed, it is never invalid:
invalidity bubbles up from child to parent only for elements with
declarations.

(8) Suppose we specify (a) strict wildcard mode, and (b) the resulting
PSVI has [validity]=valid or [validity]=notKnown (i.e. does NOT have
[validity]=invalid) on every element and attribute (i.e. there are no
invalid elements or attributes).

Result: legal.  The schema you describe has no element declaration for
r:MyFavoriteBookStore, so it's laxly assessed and the PSVI has
[validity]=notKnown.  The r:BookStore child is valid, as are all of
its children.  And if they weren't because you replaced the third
r:Book element with the invalid element given above, the document
would not be legal, because the r:BookStore element would be invalid.


I hope this helps.


-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************

Received on Thursday, 4 June 2009 23:34:05 UTC