W3C home > Mailing lists > Public > xmlschema-dev@w3.org > May 2009

RE: [XML Schema 1.1] Many questions about openContent

From: Costello, Roger L. <costello@mitre.org>
Date: Fri, 29 May 2009 12:32:22 -0400
To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Message-ID: <9E51F88D5247B648908850C35A3BBB5003FD40C13C@IMCMBX3.MITRE.ORG>
Thanks Noah and Michael. That helps a lot. I appreciate your clear and easy-to-understand explanations.

Noah, you made an interesting choice of words:

> In plain English it says ...

Why can't the specification be written in plain English?

Why is it written so painfully complex?

Surely specifications can be written to be both easy to understand and precise. No?

This is complete gobbledygook:

Validation Rule: Element Sequence Locally Valid (Complex Content)
For a sequence S (possibly empty) of element information items to be locally ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to a Content Type<http://www.w3.org/TR/xmlschema11-1/#ct> CT, the appropriate case among the following must be true:
1 If CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content> is ·absent·<http://www.w3.org/TR/xmlschema11-1/#key-null> , then S is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2)<http://www.w3.org/TR/xmlschema11-1/#cvc-particle>.

2 If CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{mode}<http://www.w3.org/TR/xmlschema11-1/#oc-mode> = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true:
2.1 S = S1 + S2
2.2 S1 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2)<http://www.w3.org/TR/xmlschema11-1/#cvc-particle>.
2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path·<http://www.w3.org/TR/xmlschema11-1/#key-path> in CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>
2.4 Every element in S2 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to the wildcard CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{wildcard}<http://www.w3.org/TR/xmlschema11-1/#oc-wildcard>, as defined in Item Valid (Wildcard) (§3.10.4.1)<http://www.w3.org/TR/xmlschema11-1/#cvc-wildcard>.

3 otherwise (CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{mode}<http://www.w3.org/TR/xmlschema11-1/#oc-mode> = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true:
3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)<http://www.w3.org/TR/xmlschema11-1/#all-mg>)
3.2 S1 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2)<http://www.w3.org/TR/xmlschema11-1/#cvc-particle>.
3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path·<http://www.w3.org/TR/xmlschema11-1/#key-path> in CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>
3.4 Every element in S2 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to the wildcard CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{wildcard}<http://www.w3.org/TR/xmlschema11-1/#oc-wildcard>, as defined in Item Valid (Wildcard) (§3.10.4.1)<http://www.w3.org/TR/xmlschema11-1/#cvc-wildcard>.

Sorry to be such a whiner. Perhaps others don't have any difficulty reading the specification.

/Roger



________________________________
From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
Sent: Friday, May 29, 2009 12:11 PM
To: Costello, Roger L.
Cc: 'xmlschema-dev@w3.org'
Subject: RE: [XML Schema 1.1] Many questions about openContent


Roger Costello writes:

> So the <any> element within an <openContent> is always
> (effectively) minOccurs="1" and maxOccurs="1". Correct?

At the risk of jumping ahead of experts who know how this really works, I think the answer you want is "no, that's not correct."

So far, we've been talking about how the XML markup maps to components, and there the minOccurs/maxOccurs is indeed ignored.  I strongly suspect that what you want to ask is: what does an interleave openContent validate?  For that, see clause 3 of the following [1]:

Validation Rule: Element Sequence Locally Valid (Complex Content)
For a sequence S (possibly empty) of element information items to be locally ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to a Content Type<http://www.w3.org/TR/xmlschema11-1/#ct> CT, the appropriate case among the following must be true:
1 If CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content> is ·absent·<http://www.w3.org/TR/xmlschema11-1/#key-null> , then S is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2)<http://www.w3.org/TR/xmlschema11-1/#cvc-particle>.

2 If CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{mode}<http://www.w3.org/TR/xmlschema11-1/#oc-mode> = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true:
2.1 S = S1 + S2
2.2 S1 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2)<http://www.w3.org/TR/xmlschema11-1/#cvc-particle>.
2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path·<http://www.w3.org/TR/xmlschema11-1/#key-path> in CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>
2.4 Every element in S2 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to the wildcard CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{wildcard}<http://www.w3.org/TR/xmlschema11-1/#oc-wildcard>, as defined in Item Valid (Wildcard) (§3.10.4.1)<http://www.w3.org/TR/xmlschema11-1/#cvc-wildcard>.

3 otherwise (CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{mode}<http://www.w3.org/TR/xmlschema11-1/#oc-mode> = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true:
3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)<http://www.w3.org/TR/xmlschema11-1/#all-mg>)
3.2 S1 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2)<http://www.w3.org/TR/xmlschema11-1/#cvc-particle>.
3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path·<http://www.w3.org/TR/xmlschema11-1/#key-path> in CT.{particle}<http://www.w3.org/TR/xmlschema11-1/#ct-particle>
3.4 Every element in S2 is ·valid·<http://www.w3.org/TR/xmlschema11-1/#key-vn> with respect to the wildcard CT.{open content}<http://www.w3.org/TR/xmlschema11-1/#ct-open_content>.{wildcard}<http://www.w3.org/TR/xmlschema11-1/#oc-wildcard>, as defined in Item Valid (Wildcard) (§3.10.4.1)<http://www.w3.org/TR/xmlschema11-1/#cvc-wildcard>.


Actually, to learn to read this, the suffix case in clause 2 is probably easier.  In plain English it says, if you have suffix open content, then to be valid, your input must start with content (possibly empty) that matches the explicit content (that's S1) and the rest (S2) must be such that >>every element<< in S2 must be valid with respect to the open content wildcard.  That "every element" tells you that more than one element is accepted by the wildcrard.

Now, turning to the interleave case, the spirit is the same.  It's saying that, sprinkled through the content being validated must be a sequence of elements (S1)that, taken together, validate against the explicit content model particle.  It then must be the case that >>each of the elements you skipped<< (I.e. Every element in S2) is valid with respect to the wildcard.

So, while there's nothing about occurrence counts in the component model, all the open contents act as if they were (0, unbounded), not (1,1).

FYI: early in the design work, we tried to do open content by adding explicit wildcards to the content models, and it got very messy.  So, while open content looks like a wildcard and leverages a lot of the markup and some mappings from traditional <any>, it's got its own magic validation mechanism.  DFA weenies would call these "spinner states", I.e. states in the automaton that spin skipping content that the DFA would otherwise not accept.

I'm fairly sure I got this right, and I hope it helps.

Noah

[1] http://www.w3.org/TR/xmlschema11-1/#cvc-complex-content

--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------






"Costello, Roger L." <costello@mitre.org>
Sent by: xmlschema-dev-request@w3.org

05/29/2009 11:50 AM


        To:        "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
        cc:        (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        RE: [XML Schema 1.1] Many questions about openContent




Thanks Michael.

So the <any> element within an <openContent> is always (effectively) minOccurs="1" and maxOccurs="1". Correct?

That is, it is effectively this:

<element name="Book">
   <complexType>
       <openContent mode="interleaved">
           <any minOccurs="1" maxOccurs="1" />
       </openContent>
       <sequence>
           <element name="Title" type="string"/>
           <element name="Author" type="string" />
           <element name="Date" type="string"/>
           <element name="ISBN" type="string"/>
           <element name="Publisher" type="string"/>
       </sequence>
   </complexType>
</element>

Correct?

Now, what does that mean?

Does it mean that 1 new element *must* be inserted into the <sequence> content model? Or, does it mean that:

  Before the <Title> element there *must* be 1 new element, and
  Before the <Author element there *must* be 1 new element, and
  Before the <Date> element there *must* be 1 new element, and
  Before the <ISBN> element there *must* be 1 new element, and
  Before the <Publisher> element there *must* be 1 new element, and
  After the <Publisher> element there *must* be 1 new element.


And what about mode="suffix":

<element name="Book">
   <complexType>
       <openContent mode="suffix">
           <any />
       </openContent>
       <sequence>
           <element name="Title" type="string"/>
           <element name="Author" type="string" />
           <element name="Date" type="string"/>
           <element name="ISBN" type="string"/>
           <element name="Publisher" type="string"/>
       </sequence>
   </complexType>
</element>

Does this mean that 1 new element must always be placed at the bottom of the <sequence> content model (after the <Publisher> element)?

/Roger



> -----Original Message-----
> From: Michael Kay [mailto:mike@saxonica.com]
> Sent: Friday, May 29, 2009 11:39 AM
> To: Costello, Roger L.; xmlschema-dev@w3.org
> Subject: RE: [XML Schema 1.1] Many questions about openContent
>
>
> On the first question, the schema component model for open
> content does not
> include a minOccurs and maxOccurs value (the {open content}
> property is a
> wildcard, not a wildcard particle). As far as I can see, it
> is permitted to
> specify these values in the XML representation, but they are ignored.
> Perhaps they should not be allowed: I'll raise a bug to propose this.
>
> On the second question, mode="none" is used in the same way as
> use="prohibited" on attributes, to suppress inheritance of
> openContent in a
> type that would otherwise acquire it automatically.
>
> Regards,
>
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay
>
>
>
> > -----Original Message-----
> > From: xmlschema-dev-request@w3.org
> > [mailto:xmlschema-dev-request@w3.org] On Behalf Of
> Costello, Roger L.
> > Sent: 29 May 2009 16:23
> > To: 'xmlschema-dev@w3.org'
> > Subject: [XML Schema 1.1] Many questions about openContent
> >
> >
> > Hi Folks,
> >
> > Here is an example of declaring a <Book> element with open content:
> >
> > <element name="Book">
> >     <complexType>
> >         <openContent mode="interleaved">
> >             <any minOccurs="..." maxOccurs="..." />
> >         </openContent>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> >
> > Notice that I left unspecified the value of minOccurs and
> > maxOccurs on the <any> element.
> >
> > If I specify minOccurs="0" and maxOccurs="1" does it mean
> > that 0-1 new elements can be inserted into the <sequence>
> > content model? Or, does it mean that:
> >
> >    Before the <Title> element there can be 0-1 new elements, and
> >    Before the <Author element there can be 0-1 new elements, and
> >    Before the <Date> element there can be 0-1 new elements, and
> >    Before the <ISBN> element there can be 0-1 new elements, and
> >    Before the <Publisher> element there can be 0-1 new elements, and
> >    After the <Publisher> element there can be 0-1 new elements.
> >
> > If I specify minOccurs="1" and maxOccurs="1" does it mean
> > that 1 new element must be inserted into the <sequence>
> > content model? Or, does it mean that:
> >
> >    Before the <Title> element there must be 1 new element, and
> >    Before the <Author element there must be 1 new element, and
> >    Before the <Date> element there must be 1 new element, and
> >    Before the <ISBN> element there must be 1 new element, and
> >    Before the <Publisher> element there must be 1 new element, and
> >    After the <Publisher> element there must be 1 new element.
> >
> > If I specify minOccurs="0" and maxOccurs="unbounded" does it
> > mean that 0-unbounded new elements can be inserted into the
> > <sequence> content model? Or, does it mean that:
> >
> >    Before the <Title> element there can be 0-unbounded new
> > elements, and
> >    Before the <Author element there can be 0-unbounded new
> > elements, and
> >    Before the <Date> element there can be 0-unbounded new
> > elements, and
> >    Before the <ISBN> element there can be 0-unbounded new
> > elements, and
> >    Before the <Publisher> element there can be 0-unbounded
> > new elements, and
> >    After the <Publisher> element there can be 0-unbounded new
> > elements.
> >
> >
> > Next, suppose I change the mode to 'suffix':
> >
> > <element name="Book">
> >     <complexType>
> >         <openContent mode="suffix">
> >             <any minOccurs="..." maxOccurs="..." />
> >         </openContent>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> >
> > I believe mode="suffix" means that new elements must always
> > be placed at the bottom of the <sequence> content model
> > (after the <Publisher> element). Correct?
> >
> > If I specify minOccurs="0" and maxOccurs="1" does it mean
> > that 0-1 new elements can be inserted at the bottom of the
> > <sequence> content model?
> >
> > If I specify minOccurs="1" and maxOccurs="1" does it mean
> > that 1 new element must be inserted at the bottom of the
> > <sequence> content model?
> >
> > If I specify minOccurs="0" and maxOccurs="unbounded" does it
> > mean that 0-unbounded new elements can be inserted at the
> > bottom of the <sequence> content model?
> >
> >
> > Lastly, suppose I change the mode to 'none':
> >
> > <element name="Book">
> >     <complexType>
> >         <openContent mode="none">
> >             <any minOccurs="..." maxOccurs="..." />
> >         </openContent>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> >
> > What does mode="none" mean? Does it mean:
> >
> >     You cannot insert new elements into the <sequence>
> content model.
> >
> > How is it different from this (no openContent specified):
> >
> > <element name="Book">
> >     <complexType>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> >
> > Are they the same? If they are, why have mode="none"? What's
> > its value?
> >
> > /Roger
>
>
Received on Friday, 29 May 2009 16:32:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:15:12 GMT