W3C home > Mailing lists > Public > xmlschema-dev@w3.org > May 2009

RE: [XML Schema 1.1] Many questions about openContent

From: <noah_mendelsohn@us.ibm.com>
Date: Fri, 29 May 2009 12:11:17 -0400
To: "Costello, Roger L." <costello@mitre.org>
Cc: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Message-ID: <OF79A4253F.18BD0DCF-ON852575C5.00581B61-852575C5.0058BD45@lotus.com>
Roger Costello writes:

> So the <any> element within an <openContent> is always 
> (effectively) minOccurs="1" and maxOccurs="1". Correct?

At the risk of jumping ahead of experts who know how this really works, I 
think the answer you want is "no, that's not correct."

So far, we've been talking about how the XML markup maps to components, 
and there the minOccurs/maxOccurs is indeed ignored.  I strongly suspect 
that what you want to ask is: what does an interleave openContent 
validate?  For that, see clause 3 of the following [1]:

Validation Rule: Element Sequence Locally Valid (Complex Content)
For a sequence S (possibly empty) of element information items to be 
locally ·valid· with respect to a Content Type CT, the appropriate case 
among the following must be true:
1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT.
{particle}, as defined in Element Sequence Locally Valid (Particle) 
(§3.9.4.2). 

2 If CT.{open content}.{mode} = suffix , then S can be represented as two 
subsequences S1 and S2 (either can be empty) such that all of the 
following are true:
2.1 S = S1 + S2 
2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element 
Sequence Locally Valid (Particle) (§3.9.4.2). 
2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does 
not have a ·path· in CT.{particle} 
2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open 
content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 

3 otherwise (CT.{open content}.{mode} = interleave) S can be represented 
as two subsequences S1 and S2 (either can be empty) such that all of the 
following are true:
3.1 S is a member of S1 × S2 (where × is the interleave operator, see 
All-groups (§3.8.4.1.3)) 
3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element 
Sequence Locally Valid (Particle) (§3.9.4.2). 
3.3 For every element E in S2, let S3 be the longest prefix of S1 where 
members of S3 are before E in S, then S3 + E does not have a ·path· in CT.
{particle} 
3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open 
content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 


Actually, to learn to read this, the suffix case in clause 2 is probably 
easier.  In plain English it says, if you have suffix open content, then 
to be valid, your input must start with content (possibly empty) that 
matches the explicit content (that's S1) and the rest (S2) must be such 
that >>every element<< in S2 must be valid with respect to the open 
content wildcard.  That "every element" tells you that more than one 
element is accepted by the wildcrard.

Now, turning to the interleave case, the spirit is the same.  It's saying 
that, sprinkled through the content being validated must be a sequence of 
elements (S1)that, taken together, validate against the explicit content 
model particle.  It then must be the case that >>each of the elements you 
skipped<< (I.e. Every element in S2) is valid with respect to the 
wildcard.

So, while there's nothing about occurrence counts in the component model, 
all the open contents act as if they were (0, unbounded), not (1,1).

FYI: early in the design work, we tried to do open content by adding 
explicit wildcards to the content models, and it got very messy.  So, 
while open content looks like a wildcard and leverages a lot of the markup 
and some mappings from traditional <any>, it's got its own magic 
validation mechanism.  DFA weenies would call these "spinner states", I.e. 
states in the automaton that spin skipping content that the DFA would 
otherwise not accept.

I'm fairly sure I got this right, and I hope it helps.

Noah

[1] http://www.w3.org/TR/xmlschema11-1/#cvc-complex-content

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Costello, Roger L." <costello@mitre.org>
Sent by: xmlschema-dev-request@w3.org
05/29/2009 11:50 AM
 
        To:     "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        RE: [XML Schema 1.1] Many questions about 
openContent


 
Thanks Michael.

So the <any> element within an <openContent> is always (effectively) 
minOccurs="1" and maxOccurs="1". Correct?

That is, it is effectively this:

<element name="Book">
    <complexType>
        <openContent mode="interleaved">
            <any minOccurs="1" maxOccurs="1" />
        </openContent>
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string" />
            <element name="Date" type="string"/>
            <element name="ISBN" type="string"/>
            <element name="Publisher" type="string"/>
        </sequence>
    </complexType>
</element>

Correct?

Now, what does that mean?

Does it mean that 1 new element *must* be inserted into the <sequence> 
content model? Or, does it mean that:

   Before the <Title> element there *must* be 1 new element, and
   Before the <Author element there *must* be 1 new element, and
   Before the <Date> element there *must* be 1 new element, and
   Before the <ISBN> element there *must* be 1 new element, and
   Before the <Publisher> element there *must* be 1 new element, and
   After the <Publisher> element there *must* be 1 new element.


And what about mode="suffix":

<element name="Book">
    <complexType>
        <openContent mode="suffix">
            <any />
        </openContent>
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string" />
            <element name="Date" type="string"/>
            <element name="ISBN" type="string"/>
            <element name="Publisher" type="string"/>
        </sequence>
    </complexType>
</element>

Does this mean that 1 new element must always be placed at the bottom of 
the <sequence> content model (after the <Publisher> element)?

/Roger



> -----Original Message-----
> From: Michael Kay [mailto:mike@saxonica.com] 
> Sent: Friday, May 29, 2009 11:39 AM
> To: Costello, Roger L.; xmlschema-dev@w3.org
> Subject: RE: [XML Schema 1.1] Many questions about openContent
> 
> 
> On the first question, the schema component model for open 
> content does not
> include a minOccurs and maxOccurs value (the {open content} 
> property is a
> wildcard, not a wildcard particle). As far as I can see, it 
> is permitted to
> specify these values in the XML representation, but they are ignored.
> Perhaps they should not be allowed: I'll raise a bug to propose this.
> 
> On the second question, mode="none" is used in the same way as
> use="prohibited" on attributes, to suppress inheritance of 
> openContent in a
> type that would otherwise acquire it automatically.
> 
> Regards,
> 
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay 
> 
> 
> 
> > -----Original Message-----
> > From: xmlschema-dev-request@w3.org 
> > [mailto:xmlschema-dev-request@w3.org] On Behalf Of 
> Costello, Roger L.
> > Sent: 29 May 2009 16:23
> > To: 'xmlschema-dev@w3.org'
> > Subject: [XML Schema 1.1] Many questions about openContent
> > 
> > 
> > Hi Folks,
> > 
> > Here is an example of declaring a <Book> element with open content:
> > 
> > <element name="Book">
> >     <complexType>
> >         <openContent mode="interleaved">
> >             <any minOccurs="..." maxOccurs="..." />
> >         </openContent>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> > 
> > Notice that I left unspecified the value of minOccurs and 
> > maxOccurs on the <any> element.
> > 
> > If I specify minOccurs="0" and maxOccurs="1" does it mean 
> > that 0-1 new elements can be inserted into the <sequence> 
> > content model? Or, does it mean that:
> > 
> >    Before the <Title> element there can be 0-1 new elements, and
> >    Before the <Author element there can be 0-1 new elements, and
> >    Before the <Date> element there can be 0-1 new elements, and
> >    Before the <ISBN> element there can be 0-1 new elements, and
> >    Before the <Publisher> element there can be 0-1 new elements, and
> >    After the <Publisher> element there can be 0-1 new elements.
> > 
> > If I specify minOccurs="1" and maxOccurs="1" does it mean 
> > that 1 new element must be inserted into the <sequence> 
> > content model? Or, does it mean that:
> > 
> >    Before the <Title> element there must be 1 new element, and
> >    Before the <Author element there must be 1 new element, and
> >    Before the <Date> element there must be 1 new element, and
> >    Before the <ISBN> element there must be 1 new element, and
> >    Before the <Publisher> element there must be 1 new element, and
> >    After the <Publisher> element there must be 1 new element.
> > 
> > If I specify minOccurs="0" and maxOccurs="unbounded" does it 
> > mean that 0-unbounded new elements can be inserted into the 
> > <sequence> content model? Or, does it mean that:
> > 
> >    Before the <Title> element there can be 0-unbounded new 
> > elements, and
> >    Before the <Author element there can be 0-unbounded new 
> > elements, and
> >    Before the <Date> element there can be 0-unbounded new 
> > elements, and
> >    Before the <ISBN> element there can be 0-unbounded new 
> > elements, and
> >    Before the <Publisher> element there can be 0-unbounded 
> > new elements, and
> >    After the <Publisher> element there can be 0-unbounded new 
> > elements.
> > 
> > 
> > Next, suppose I change the mode to 'suffix':
> > 
> > <element name="Book">
> >     <complexType>
> >         <openContent mode="suffix">
> >             <any minOccurs="..." maxOccurs="..." />
> >         </openContent>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> > 
> > I believe mode="suffix" means that new elements must always 
> > be placed at the bottom of the <sequence> content model 
> > (after the <Publisher> element). Correct?
> > 
> > If I specify minOccurs="0" and maxOccurs="1" does it mean 
> > that 0-1 new elements can be inserted at the bottom of the 
> > <sequence> content model? 
> > 
> > If I specify minOccurs="1" and maxOccurs="1" does it mean 
> > that 1 new element must be inserted at the bottom of the 
> > <sequence> content model? 
> > 
> > If I specify minOccurs="0" and maxOccurs="unbounded" does it 
> > mean that 0-unbounded new elements can be inserted at the 
> > bottom of the <sequence> content model? 
> > 
> > 
> > Lastly, suppose I change the mode to 'none':
> > 
> > <element name="Book">
> >     <complexType>
> >         <openContent mode="none">
> >             <any minOccurs="..." maxOccurs="..." />
> >         </openContent>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> > 
> > What does mode="none" mean? Does it mean:
> > 
> >     You cannot insert new elements into the <sequence> 
> content model.
> > 
> > How is it different from this (no openContent specified):
> > 
> > <element name="Book">
> >     <complexType>
> >         <sequence>
> >             <element name="Title" type="string"/>
> >             <element name="Author" type="string" />
> >             <element name="Date" type="string"/>
> >             <element name="ISBN" type="string"/>
> >             <element name="Publisher" type="string"/>
> >         </sequence>
> >     </complexType>
> > </element>
> > 
> > Are they the same? If they are, why have mode="none"? What's 
> > its value?
> > 
> > /Roger
> 
> 
Received on Friday, 29 May 2009 16:10:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:15:12 GMT