Re: [XML Schema 1.1] Two ways to create interleaved, any-order content ... are they identical? from C. M. Sperberg-McQueen on 2009-10-13 (xmlschema-dev@w3.org from October 2009)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 12 Oct 2009 18:02:10 -0600
To: "Costello, Roger L." <costello@mitre.org>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-Id: <EE87BC3C-D0CB-4226-A72F-6882830C51BA@blackmesatech.com>

On 12 Oct 2009, at 11:00 , Costello, Roger L. wrote:

>
> Hi Folks,
>
> Below are two ways to declare a <Book> element.
>
> Both versions use <all>, to permit the elements within <Book> to  
> occur in any order.
>
> The first version uses an unbounded <any>. The second version uses  
> interleaved open content.
>
> Are these two versions identical?

You may mean "Do they accept the same inputs as valid instances
of element Book?" Yes, I think they do.

> If so, is there an advantage of one over the other?

Some people may find one formulation clearer or simpler than the
other; they will rightly prefer the one they find clearer.  I
expect different people will have different preferences, depending
on their tastes.

If a schema is designed so that ever complex type, or most of them,
has a particular form of open content, then the open content can
be defaulted at the schema document level, which will make most
content models shorter and simpler.  Readers who forget that the
schema document supplies default open content may be surprised and
complain about action at a distance.

In the case of all-groups, using an explicit wildcard and using
interleave open content are roughly similar in complexity of the
declaration.  In other cases, explicit wildcards are much more
verbose and for many schema authors rather error-prone.  See
http://www.w3.org/TR/xmlschema-guide2versioning/ for examples.

> If they are not identical, how do they differ?

You may mean "Do they produce indistinguishable PSVIs?"

No, not quite; the [match information] property in the PSVI allows
the two to be distinguished, for elements other than Author, Title,
Date, ISBN, and Publisher:  in the one case, those elements will have
[match information] = 'lax' (since they match a lax wildcard in the
content model), and in the other they will have [match information]
= 'open' (since they match open content).  This allows the
downstream application to distinguish the two cases, if it wishes to.

In the usual case, making downstream processing depend on such a subtle
distinction is probably not a good idea, but YMMV and there may be
special circumstances.  Certainly there are some designers who
like the idea of being able to say that if an element in the input
matches an element in the version N content model, then a
version N processor is obligated to process it in a certain
way, and if the element in the input matches a wildcard (or:
matches only open content) in the version N content model, then
a version N processor is obligated to tolerate it, or to ignore
it, or to handle it in some other way.  In such a design,
"matches an element" and "matches a wildcard" are syntax-level
signals for different kinds of processing.  The signal "matches
an open-content wildcard" can fit nicely into this pattern,
either to signal a third kind of processing or to shift the
distinction from element-vs-wildcard to content-model-vs-opencontent.

HTH

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************

Received on Tuesday, 13 October 2009 00:02:41 UTC