- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 29 May 2009 14:31:56 -0400
- To: "Costello, Roger L." <costello@mitre.org>
- Cc: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Roger Costello asks: > Why can't the specification be written in plain English? > > Why is it written so painfully complex? > > Surely specifications can be written to be both easy to > understand and precise. No? That's a permathread I'd rather not reopen. I can only say, we've tried more than once. We could also try re-opening permathread #2: if the language was simpler, then your chances of doing a somewhat precise description in easily accessible English would be improved. True. The language isn't simple. I don't have time or energy to debate whether it should have been. I think you'll find that writing precise English for these sorts of things can be a lot harder than it looks. Yes, you can write something that sort of looks like it tells you how things work, but whether it really answers unambiguously questions about the edge cases is another question (see example from MK about sum of empty sets). If you have a spare afternoon, I invite you to try taking a bit of XSD you think you understand and writing it precisely :-) (I have, more than once.) I'll also share some history from SOAP. Everyone loved SOAP 1.1 because the specification was short, in mostly simple English and somewhat informal. Then we went off for a few years and tried to make it more precise in SOAP 1.2. SOAP 1.2 adoption was somewhat slow (in part because we made the mistake of switching namespaces, I think), but guess what happened? There were so many interop problems with SOAP 1.1 that the WS-I group popped up primarily to solve them. And how did they solve them? Almost entirely by pointing (explicitly) to more precise clauses in the SOAP 1.2 specification, sayiing "Do that", but just keep the old namespace. Yes, the informal spec helped a lot of people get going and kept them from being scared of SOAP. Then we needed a whole new organization plus another working group to make it precise enough to actually use. Some languages have the nice characteristic that the rigorous specification is also an accessible introduction for novices and everyday users. Lots of languages we use don't have that characteristic. McCarthy's original LISP paper [1] is often touted as a shining example of a beautiful, terse language exposition. In 33 pages, it sets out the rigorous formalism, and to some extent gives the correct interpretation of the language in terms of itself. I would argue that it's not a successful primer for novices, and the LISP world would be a lot poorer if LISP had been documented in English (I too have trouble with some of the math, but this is a great, great paper. Anyone interested in CS and computer science should study this!). Java is a language that a lot of people have learned to use. How many people learn it from the official Java Language Specification [2]? That specification is overall very well written, but as with XSD, there are just so many details to get right that it's tough going unless you're into languages. Picking a section at random from the JLS, take a look at [3]. Does it feel that different from the XSD Recommendation? Look at a phrase like: "The erasures (§4.6) of all constituent types of a bound must be pairwise different, or a compile-time error occurs." Know any Java programmers who'd recognize that one? I'm not justifying obscurity, but folks like Guy Steele are among the best around in doing language specs, and they wind up writing stuff like this too. We use their languages all the time. FWIW, I actually find the section I quoted from the XSD 1.1 spec to be tighter and much more comprehensible than similar presentations in XSD 1.0. Yes, you have to manage a bit of math to figure out things like interleave, but what you have there is a 3 line specification of the validation of the wildcard, and it was sufficiently precise for me to use it to answer your question. Just as with Java and many other languages, I think the right way to learn XSD for most people is from primers, books like Priscilla Walmsley's, etc. The spec. has to be precise, and it has to cover lots of details that are of limited interest to typical users. (Think classloaders in Java, or PSVI for schema.) Noah [1] http://www-formal.stanford.edu/jmc/recursive.html [2] http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html [3] http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.4 -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@mitre.org> Sent by: xmlschema-dev-request@w3.org 05/29/2009 12:32 PM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: [XML Schema 1.1] Many questions about openContent Thanks Noah and Michael. That helps a lot. I appreciate your clear and easy-to-understand explanations. Noah, you made an interesting choice of words: > In plain English it says ... Why can't the specification be written in plain English? Why is it written so painfully complex? Surely specifications can be written to be both easy to understand and precise. No? This is complete gobbledygook: Validation Rule: Element Sequence Locally Valid (Complex Content) For a sequence S (possibly empty) of element information items to be locally ·valid· with respect to a Content Type CT, the appropriate case among the following must be true: 1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2 If CT.{open content}.{mode} = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 2.1 S = S1 + S2 2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path· in CT.{particle} 2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 3 otherwise (CT.{open content}.{mode} = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)) 3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path· in CT.{particle} 3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). Sorry to be such a whiner. Perhaps others don't have any difficulty reading the specification. /Roger From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] Sent: Friday, May 29, 2009 12:11 PM To: Costello, Roger L. Cc: 'xmlschema-dev@w3.org' Subject: RE: [XML Schema 1.1] Many questions about openContent Roger Costello writes: > So the <any> element within an <openContent> is always > (effectively) minOccurs="1" and maxOccurs="1". Correct? At the risk of jumping ahead of experts who know how this really works, I think the answer you want is "no, that's not correct." So far, we've been talking about how the XML markup maps to components, and there the minOccurs/maxOccurs is indeed ignored. I strongly suspect that what you want to ask is: what does an interleave openContent validate? For that, see clause 3 of the following [1]: Validation Rule: Element Sequence Locally Valid (Complex Content) For a sequence S (possibly empty) of element information items to be locally ·valid· with respect to a Content Type CT, the appropriate case among the following must be true: 1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2 If CT.{open content}.{mode} = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 2.1 S = S1 + S2 2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path· in CT.{particle} 2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 3 otherwise (CT.{open content}.{mode} = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)) 3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path· in CT.{particle} 3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). Actually, to learn to read this, the suffix case in clause 2 is probably easier. In plain English it says, if you have suffix open content, then to be valid, your input must start with content (possibly empty) that matches the explicit content (that's S1) and the rest (S2) must be such that >>every element<< in S2 must be valid with respect to the open content wildcard. That "every element" tells you that more than one element is accepted by the wildcrard. Now, turning to the interleave case, the spirit is the same. It's saying that, sprinkled through the content being validated must be a sequence of elements (S1)that, taken together, validate against the explicit content model particle. It then must be the case that >>each of the elements you skipped<< (I.e. Every element in S2) is valid with respect to the wildcard. So, while there's nothing about occurrence counts in the component model, all the open contents act as if they were (0, unbounded), not (1,1). FYI: early in the design work, we tried to do open content by adding explicit wildcards to the content models, and it got very messy. So, while open content looks like a wildcard and leverages a lot of the markup and some mappings from traditional <any>, it's got its own magic validation mechanism. DFA weenies would call these "spinner states", I.e. states in the automaton that spin skipping content that the DFA would otherwise not accept. I'm fairly sure I got this right, and I hope it helps. Noah [1] http://www.w3.org/TR/xmlschema11-1/#cvc-complex-content -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@mitre.org> Sent by: xmlschema-dev-request@w3.org 05/29/2009 11:50 AM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: [XML Schema 1.1] Many questions about openContent Thanks Michael. So the <any> element within an <openContent> is always (effectively) minOccurs="1" and maxOccurs="1". Correct? That is, it is effectively this: <element name="Book"> <complexType> <openContent mode="interleaved"> <any minOccurs="1" maxOccurs="1" /> </openContent> <sequence> <element name="Title" type="string"/> <element name="Author" type="string" /> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </sequence> </complexType> </element> Correct? Now, what does that mean? Does it mean that 1 new element *must* be inserted into the <sequence> content model? Or, does it mean that: Before the <Title> element there *must* be 1 new element, and Before the <Author element there *must* be 1 new element, and Before the <Date> element there *must* be 1 new element, and Before the <ISBN> element there *must* be 1 new element, and Before the <Publisher> element there *must* be 1 new element, and After the <Publisher> element there *must* be 1 new element. And what about mode="suffix": <element name="Book"> <complexType> <openContent mode="suffix"> <any /> </openContent> <sequence> <element name="Title" type="string"/> <element name="Author" type="string" /> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </sequence> </complexType> </element> Does this mean that 1 new element must always be placed at the bottom of the <sequence> content model (after the <Publisher> element)? /Roger > -----Original Message----- > From: Michael Kay [mailto:mike@saxonica.com] > Sent: Friday, May 29, 2009 11:39 AM > To: Costello, Roger L.; xmlschema-dev@w3.org > Subject: RE: [XML Schema 1.1] Many questions about openContent > > > On the first question, the schema component model for open > content does not > include a minOccurs and maxOccurs value (the {open content} > property is a > wildcard, not a wildcard particle). As far as I can see, it > is permitted to > specify these values in the XML representation, but they are ignored. > Perhaps they should not be allowed: I'll raise a bug to propose this. > > On the second question, mode="none" is used in the same way as > use="prohibited" on attributes, to suppress inheritance of > openContent in a > type that would otherwise acquire it automatically. > > Regards, > > Michael Kay > http://www.saxonica.com/ > http://twitter.com/michaelhkay > > > > > -----Original Message----- > > From: xmlschema-dev-request@w3.org > > [mailto:xmlschema-dev-request@w3.org] On Behalf Of > Costello, Roger L. > > Sent: 29 May 2009 16:23 > > To: 'xmlschema-dev@w3.org' > > Subject: [XML Schema 1.1] Many questions about openContent > > > > > > Hi Folks, > > > > Here is an example of declaring a <Book> element with open content: > > > > <element name="Book"> > > <complexType> > > <openContent mode="interleaved"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > Notice that I left unspecified the value of minOccurs and > > maxOccurs on the <any> element. > > > > If I specify minOccurs="0" and maxOccurs="1" does it mean > > that 0-1 new elements can be inserted into the <sequence> > > content model? Or, does it mean that: > > > > Before the <Title> element there can be 0-1 new elements, and > > Before the <Author element there can be 0-1 new elements, and > > Before the <Date> element there can be 0-1 new elements, and > > Before the <ISBN> element there can be 0-1 new elements, and > > Before the <Publisher> element there can be 0-1 new elements, and > > After the <Publisher> element there can be 0-1 new elements. > > > > If I specify minOccurs="1" and maxOccurs="1" does it mean > > that 1 new element must be inserted into the <sequence> > > content model? Or, does it mean that: > > > > Before the <Title> element there must be 1 new element, and > > Before the <Author element there must be 1 new element, and > > Before the <Date> element there must be 1 new element, and > > Before the <ISBN> element there must be 1 new element, and > > Before the <Publisher> element there must be 1 new element, and > > After the <Publisher> element there must be 1 new element. > > > > If I specify minOccurs="0" and maxOccurs="unbounded" does it > > mean that 0-unbounded new elements can be inserted into the > > <sequence> content model? Or, does it mean that: > > > > Before the <Title> element there can be 0-unbounded new > > elements, and > > Before the <Author element there can be 0-unbounded new > > elements, and > > Before the <Date> element there can be 0-unbounded new > > elements, and > > Before the <ISBN> element there can be 0-unbounded new > > elements, and > > Before the <Publisher> element there can be 0-unbounded > > new elements, and > > After the <Publisher> element there can be 0-unbounded new > > elements. > > > > > > Next, suppose I change the mode to 'suffix': > > > > <element name="Book"> > > <complexType> > > <openContent mode="suffix"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > I believe mode="suffix" means that new elements must always > > be placed at the bottom of the <sequence> content model > > (after the <Publisher> element). Correct? > > > > If I specify minOccurs="0" and maxOccurs="1" does it mean > > that 0-1 new elements can be inserted at the bottom of the > > <sequence> content model? > > > > If I specify minOccurs="1" and maxOccurs="1" does it mean > > that 1 new element must be inserted at the bottom of the > > <sequence> content model? > > > > If I specify minOccurs="0" and maxOccurs="unbounded" does it > > mean that 0-unbounded new elements can be inserted at the > > bottom of the <sequence> content model? > > > > > > Lastly, suppose I change the mode to 'none': > > > > <element name="Book"> > > <complexType> > > <openContent mode="none"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > What does mode="none" mean? Does it mean: > > > > You cannot insert new elements into the <sequence> > content model. > > > > How is it different from this (no openContent specified): > > > > <element name="Book"> > > <complexType> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > Are they the same? If they are, why have mode="none"? What's > > its value? > > > > /Roger > > Roger Costello asks: > Why can't the specification be written in plain English? > > Why is it written so painfully complex? > > Surely specifications can be written to be both easy to > understand and precise. No? That's a permathread I'd rather not reopen. I can only say, we've tried more than once. We could also try re-opening permathread #2: if the language was simpler, then your chances of doing a somewhat precise description in easily accessible English would be improved. True. The language isn't simple. I don't have time or energy to debate whether it should have been. I think you'll find that writing precise English for these sorts of things can be a lot harder than it looks. Yes, you can write something that sort of looks like it tells you how things work, but whether it really answers unambiguously questions about the edge cases as another question. If you have a spare afternoon, I invite you to try :-) (I have, more than once.) I'll also share some history from SOAP. Everyone loved SOAP 1.1 because the specification was in English and somewhat informal. Then we went off for a few years and tried to make it more precise in SOAP 1.2. SOAP 1.2 adoption was somewhat slow (in part because we made the mistake of switching namespaces, I think), but guess what happened? There were so many interop problems with SOAP 1.1 that they WS-I group popped up primarily to solve them? And how did they solve them? Almost entirely by pointing (explicitly) to more precise clauses in the SOAP 1.2 specification, sayiing "Do that", but just keep the old namespace. Yes, the informal spec helped a lot of people get going and kept them from being scared of SOAP. Then we needed a whole new organization plus another working group to make it precise enough to actually use. Some languages have the nice characteristic that the rigorous specification is also an accessible introduction for novices and everyday users. Lots of languages we use don't have that characteristic. McCarthy's original LISP paper [1] is often touted as a shining example of a beautiful, terse language exposition. In 33 pages, it sets out the rigorous formalism, and to some extent gives the correct interpretation of the language in terms of itself. I would argue that it's not a successful primer for novices, and the LISP world would be a lot poorer if LISP had been documented in English (I too have trouble with some of the math, but this is a great, great paper. Anyone interested in CS and computer science should study this!). Java is a language that a lot of people have learned to use. How many people learn it from the official Java Language Specification [2]? It's overall very well written, but as with XSD, there are just so many details to get right that it's tough going unless you're into languages. Picking a section at random from the JLS, take a look at [3]. Does it feel that different from the XSD Recommendation? Look at a phrase like: "The erasures (§4.6) of all constituent types of a bound must be pairwise different, or a compile-time error occurs." Know any Java programmers who'd recognize that one? I'm not justifying obscurity, but folks like Guy Steele are among the best around in doing language specs, and they wind up writing stuff like this too. We use their languages all the time. FWIW, I actually find the section I quoted from the XSD 1.1 spec to be tighter and much more comprehensible than similar presentations in XSD 1.0. Yes, you have to manage a bit of math to figure out things like interleave, but what you have there is a 3 line specification of the validation of the wildcard, and it was sufficiently precise for me to use it to answer your question. Noah [1] http://www-formal.stanford.edu/jmc/recursive.html [2] http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html [3] http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.4 -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@mitre.org> Sent by: xmlschema-dev-request@w3.org 05/29/2009 12:32 PM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: [XML Schema 1.1] Many questions about openContent Thanks Noah and Michael. That helps a lot. I appreciate your clear and easy-to-understand explanations. Noah, you made an interesting choice of words: > In plain English it says ... Why can't the specification be written in plain English? Why is it written so painfully complex? Surely specifications can be written to be both easy to understand and precise. No? This is complete gobbledygook: Validation Rule: Element Sequence Locally Valid (Complex Content) For a sequence S (possibly empty) of element information items to be locally ·valid· with respect to a Content Type CT, the appropriate case among the following must be true: 1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2 If CT.{open content}.{mode} = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 2.1 S = S1 + S2 2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path· in CT.{particle} 2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 3 otherwise (CT.{open content}.{mode} = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)) 3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path· in CT.{particle} 3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). Sorry to be such a whiner. Perhaps others don't have any difficulty reading the specification. /Roger From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] Sent: Friday, May 29, 2009 12:11 PM To: Costello, Roger L. Cc: 'xmlschema-dev@w3.org' Subject: RE: [XML Schema 1.1] Many questions about openContent Roger Costello writes: > So the <any> element within an <openContent> is always > (effectively) minOccurs="1" and maxOccurs="1". Correct? At the risk of jumping ahead of experts who know how this really works, I think the answer you want is "no, that's not correct." So far, we've been talking about how the XML markup maps to components, and there the minOccurs/maxOccurs is indeed ignored. I strongly suspect that what you want to ask is: what does an interleave openContent validate? For that, see clause 3 of the following [1]: Validation Rule: Element Sequence Locally Valid (Complex Content) For a sequence S (possibly empty) of element information items to be locally ·valid· with respect to a Content Type CT, the appropriate case among the following must be true: 1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2 If CT.{open content}.{mode} = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 2.1 S = S1 + S2 2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path· in CT.{particle} 2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 3 otherwise (CT.{open content}.{mode} = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)) 3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path· in CT.{particle} 3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). Actually, to learn to read this, the suffix case in clause 2 is probably easier. In plain English it says, if you have suffix open content, then to be valid, your input must start with content (possibly empty) that matches the explicit content (that's S1) and the rest (S2) must be such that >>every element<< in S2 must be valid with respect to the open content wildcard. That "every element" tells you that more than one element is accepted by the wildcrard. Now, turning to the interleave case, the spirit is the same. It's saying that, sprinkled through the content being validated must be a sequence of elements (S1)that, taken together, validate against the explicit content model particle. It then must be the case that >>each of the elements you skipped<< (I.e. Every element in S2) is valid with respect to the wildcard. So, while there's nothing about occurrence counts in the component model, all the open contents act as if they were (0, unbounded), not (1,1). FYI: early in the design work, we tried to do open content by adding explicit wildcards to the content models, and it got very messy. So, while open content looks like a wildcard and leverages a lot of the markup and some mappings from traditional <any>, it's got its own magic validation mechanism. DFA weenies would call these "spinner states", I.e. states in the automaton that spin skipping content that the DFA would otherwise not accept. I'm fairly sure I got this right, and I hope it helps. Noah [1] http://www.w3.org/TR/xmlschema11-1/#cvc-complex-content -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@mitre.org> Sent by: xmlschema-dev-request@w3.org 05/29/2009 11:50 AM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: [XML Schema 1.1] Many questions about openContent Thanks Michael. So the <any> element within an <openContent> is always (effectively) minOccurs="1" and maxOccurs="1". Correct? That is, it is effectively this: <element name="Book"> <complexType> <openContent mode="interleaved"> <any minOccurs="1" maxOccurs="1" /> </openContent> <sequence> <element name="Title" type="string"/> <element name="Author" type="string" /> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </sequence> </complexType> </element> Correct? Now, what does that mean? Does it mean that 1 new element *must* be inserted into the <sequence> content model? Or, does it mean that: Before the <Title> element there *must* be 1 new element, and Before the <Author element there *must* be 1 new element, and Before the <Date> element there *must* be 1 new element, and Before the <ISBN> element there *must* be 1 new element, and Before the <Publisher> element there *must* be 1 new element, and After the <Publisher> element there *must* be 1 new element. And what about mode="suffix": <element name="Book"> <complexType> <openContent mode="suffix"> <any /> </openContent> <sequence> <element name="Title" type="string"/> <element name="Author" type="string" /> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </sequence> </complexType> </element> Does this mean that 1 new element must always be placed at the bottom of the <sequence> content model (after the <Publisher> element)? /Roger > -----Original Message----- > From: Michael Kay [mailto:mike@saxonica.com] > Sent: Friday, May 29, 2009 11:39 AM > To: Costello, Roger L.; xmlschema-dev@w3.org > Subject: RE: [XML Schema 1.1] Many questions about openContent > > > On the first question, the schema component model for open > content does not > include a minOccurs and maxOccurs value (the {open content} > property is a > wildcard, not a wildcard particle). As far as I can see, it > is permitted to > specify these values in the XML representation, but they are ignored. > Perhaps they should not be allowed: I'll raise a bug to propose this. > > On the second question, mode="none" is used in the same way as > use="prohibited" on attributes, to suppress inheritance of > openContent in a > type that would otherwise acquire it automatically. > > Regards, > > Michael Kay > http://www.saxonica.com/ > http://twitter.com/michaelhkay > > > > > -----Original Message----- > > From: xmlschema-dev-request@w3.org > > [mailto:xmlschema-dev-request@w3.org] On Behalf Of > Costello, Roger L. > > Sent: 29 May 2009 16:23 > > To: 'xmlschema-dev@w3.org' > > Subject: [XML Schema 1.1] Many questions about openContent > > > > > > Hi Folks, > > > > Here is an example of declaring a <Book> element with open content: > > > > <element name="Book"> > > <complexType> > > <openContent mode="interleaved"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > Notice that I left unspecified the value of minOccurs and > > maxOccurs on the <any> element. > > > > If I specify minOccurs="0" and maxOccurs="1" does it mean > > that 0-1 new elements can be inserted into the <sequence> > > content model? Or, does it mean that: > > > > Before the <Title> element there can be 0-1 new elements, and > > Before the <Author element there can be 0-1 new elements, and > > Before the <Date> element there can be 0-1 new elements, and > > Before the <ISBN> element there can be 0-1 new elements, and > > Before the <Publisher> element there can be 0-1 new elements, and > > After the <Publisher> element there can be 0-1 new elements. > > > > If I specify minOccurs="1" and maxOccurs="1" does it mean > > that 1 new element must be inserted into the <sequence> > > content model? Or, does it mean that: > > > > Before the <Title> element there must be 1 new element, and > > Before the <Author element there must be 1 new element, and > > Before the <Date> element there must be 1 new element, and > > Before the <ISBN> element there must be 1 new element, and > > Before the <Publisher> element there must be 1 new element, and > > After the <Publisher> element there must be 1 new element. > > > > If I specify minOccurs="0" and maxOccurs="unbounded" does it > > mean that 0-unbounded new elements can be inserted into the > > <sequence> content model? Or, does it mean that: > > > > Before the <Title> element there can be 0-unbounded new > > elements, and > > Before the <Author element there can be 0-unbounded new > > elements, and > > Before the <Date> element there can be 0-unbounded new > > elements, and > > Before the <ISBN> element there can be 0-unbounded new > > elements, and > > Before the <Publisher> element there can be 0-unbounded > > new elements, and > > After the <Publisher> element there can be 0-unbounded new > > elements. > > > > > > Next, suppose I change the mode to 'suffix': > > > > <element name="Book"> > > <complexType> > > <openContent mode="suffix"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > I believe mode="suffix" means that new elements must always > > be placed at the bottom of the <sequence> content model > > (after the <Publisher> element). Correct? > > > > If I specify minOccurs="0" and maxOccurs="1" does it mean > > that 0-1 new elements can be inserted at the bottom of the > > <sequence> content model? > > > > If I specify minOccurs="1" and maxOccurs="1" does it mean > > that 1 new element must be inserted at the bottom of the > > <sequence> content model? > > > > If I specify minOccurs="0" and maxOccurs="unbounded" does it > > mean that 0-unbounded new elements can be inserted at the > > bottom of the <sequence> content model? > > > > > > Lastly, suppose I change the mode to 'none': > > > > <element name="Book"> > > <complexType> > > <openContent mode="none"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > What does mode="none" mean? Does it mean: > > > > You cannot insert new elements into the <sequence> > content model. > > > > How is it different from this (no openContent specified): > > > > <element name="Book"> > > <complexType> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > Are they the same? If they are, why have mode="none"? What's > > its value? > > > > /Roger > > -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@mitre.org> Sent by: xmlschema-dev-request@w3.org 05/29/2009 12:32 PM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: [XML Schema 1.1] Many questions about openContent Thanks Noah and Michael. That helps a lot. I appreciate your clear and easy-to-understand explanations. Noah, you made an interesting choice of words: > In plain English it says ... Why can't the specification be written in plain English? Why is it written so painfully complex? Surely specifications can be written to be both easy to understand and precise. No? This is complete gobbledygook: Validation Rule: Element Sequence Locally Valid (Complex Content) For a sequence S (possibly empty) of element information items to be locally ·valid· with respect to a Content Type CT, the appropriate case among the following must be true: 1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT. {particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2 If CT.{open content}.{mode} = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 2.1 S = S1 + S2 2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path· in CT.{particle} 2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 3 otherwise (CT.{open content}.{mode} = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)) 3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path· in CT. {particle} 3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). Sorry to be such a whiner. Perhaps others don't have any difficulty reading the specification. /Roger From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] Sent: Friday, May 29, 2009 12:11 PM To: Costello, Roger L. Cc: 'xmlschema-dev@w3.org' Subject: RE: [XML Schema 1.1] Many questions about openContent Roger Costello writes: > So the <any> element within an <openContent> is always > (effectively) minOccurs="1" and maxOccurs="1". Correct? At the risk of jumping ahead of experts who know how this really works, I think the answer you want is "no, that's not correct." So far, we've been talking about how the XML markup maps to components, and there the minOccurs/maxOccurs is indeed ignored. I strongly suspect that what you want to ask is: what does an interleave openContent validate? For that, see clause 3 of the following [1]: Validation Rule: Element Sequence Locally Valid (Complex Content) For a sequence S (possibly empty) of element information items to be locally ·valid· with respect to a Content Type CT, the appropriate case among the following must be true: 1 If CT.{open content} is ·absent· , then S is ·valid· with respect to CT. {particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2 If CT.{open content}.{mode} = suffix , then S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 2.1 S = S1 + S2 2.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 2.3 If S2 is not empty, let E be the first element in S2, then S1 + E does not have a ·path· in CT.{particle} 2.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). 3 otherwise (CT.{open content}.{mode} = interleave) S can be represented as two subsequences S1 and S2 (either can be empty) such that all of the following are true: 3.1 S is a member of S1 × S2 (where × is the interleave operator, see All-groups (§3.8.4.1.3)) 3.2 S1 is ·valid· with respect to CT.{particle}, as defined in Element Sequence Locally Valid (Particle) (§3.9.4.2). 3.3 For every element E in S2, let S3 be the longest prefix of S1 where members of S3 are before E in S, then S3 + E does not have a ·path· in CT. {particle} 3.4 Every element in S2 is ·valid· with respect to the wildcard CT.{open content}.{wildcard}, as defined in Item Valid (Wildcard) (§3.10.4.1). Actually, to learn to read this, the suffix case in clause 2 is probably easier. In plain English it says, if you have suffix open content, then to be valid, your input must start with content (possibly empty) that matches the explicit content (that's S1) and the rest (S2) must be such that >>every element<< in S2 must be valid with respect to the open content wildcard. That "every element" tells you that more than one element is accepted by the wildcrard. Now, turning to the interleave case, the spirit is the same. It's saying that, sprinkled through the content being validated must be a sequence of elements (S1)that, taken together, validate against the explicit content model particle. It then must be the case that >>each of the elements you skipped<< (I.e. Every element in S2) is valid with respect to the wildcard. So, while there's nothing about occurrence counts in the component model, all the open contents act as if they were (0, unbounded), not (1,1). FYI: early in the design work, we tried to do open content by adding explicit wildcards to the content models, and it got very messy. So, while open content looks like a wildcard and leverages a lot of the markup and some mappings from traditional <any>, it's got its own magic validation mechanism. DFA weenies would call these "spinner states", I.e. states in the automaton that spin skipping content that the DFA would otherwise not accept. I'm fairly sure I got this right, and I hope it helps. Noah [1] http://www.w3.org/TR/xmlschema11-1/#cvc-complex-content -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Costello, Roger L." <costello@mitre.org> Sent by: xmlschema-dev-request@w3.org 05/29/2009 11:50 AM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: [XML Schema 1.1] Many questions about openContent Thanks Michael. So the <any> element within an <openContent> is always (effectively) minOccurs="1" and maxOccurs="1". Correct? That is, it is effectively this: <element name="Book"> <complexType> <openContent mode="interleaved"> <any minOccurs="1" maxOccurs="1" /> </openContent> <sequence> <element name="Title" type="string"/> <element name="Author" type="string" /> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </sequence> </complexType> </element> Correct? Now, what does that mean? Does it mean that 1 new element *must* be inserted into the <sequence> content model? Or, does it mean that: Before the <Title> element there *must* be 1 new element, and Before the <Author element there *must* be 1 new element, and Before the <Date> element there *must* be 1 new element, and Before the <ISBN> element there *must* be 1 new element, and Before the <Publisher> element there *must* be 1 new element, and After the <Publisher> element there *must* be 1 new element. And what about mode="suffix": <element name="Book"> <complexType> <openContent mode="suffix"> <any /> </openContent> <sequence> <element name="Title" type="string"/> <element name="Author" type="string" /> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </sequence> </complexType> </element> Does this mean that 1 new element must always be placed at the bottom of the <sequence> content model (after the <Publisher> element)? /Roger > -----Original Message----- > From: Michael Kay [mailto:mike@saxonica.com] > Sent: Friday, May 29, 2009 11:39 AM > To: Costello, Roger L.; xmlschema-dev@w3.org > Subject: RE: [XML Schema 1.1] Many questions about openContent > > > On the first question, the schema component model for open > content does not > include a minOccurs and maxOccurs value (the {open content} > property is a > wildcard, not a wildcard particle). As far as I can see, it > is permitted to > specify these values in the XML representation, but they are ignored. > Perhaps they should not be allowed: I'll raise a bug to propose this. > > On the second question, mode="none" is used in the same way as > use="prohibited" on attributes, to suppress inheritance of > openContent in a > type that would otherwise acquire it automatically. > > Regards, > > Michael Kay > http://www.saxonica.com/ > http://twitter.com/michaelhkay > > > > > -----Original Message----- > > From: xmlschema-dev-request@w3.org > > [mailto:xmlschema-dev-request@w3.org] On Behalf Of > Costello, Roger L. > > Sent: 29 May 2009 16:23 > > To: 'xmlschema-dev@w3.org' > > Subject: [XML Schema 1.1] Many questions about openContent > > > > > > Hi Folks, > > > > Here is an example of declaring a <Book> element with open content: > > > > <element name="Book"> > > <complexType> > > <openContent mode="interleaved"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > Notice that I left unspecified the value of minOccurs and > > maxOccurs on the <any> element. > > > > If I specify minOccurs="0" and maxOccurs="1" does it mean > > that 0-1 new elements can be inserted into the <sequence> > > content model? Or, does it mean that: > > > > Before the <Title> element there can be 0-1 new elements, and > > Before the <Author element there can be 0-1 new elements, and > > Before the <Date> element there can be 0-1 new elements, and > > Before the <ISBN> element there can be 0-1 new elements, and > > Before the <Publisher> element there can be 0-1 new elements, and > > After the <Publisher> element there can be 0-1 new elements. > > > > If I specify minOccurs="1" and maxOccurs="1" does it mean > > that 1 new element must be inserted into the <sequence> > > content model? Or, does it mean that: > > > > Before the <Title> element there must be 1 new element, and > > Before the <Author element there must be 1 new element, and > > Before the <Date> element there must be 1 new element, and > > Before the <ISBN> element there must be 1 new element, and > > Before the <Publisher> element there must be 1 new element, and > > After the <Publisher> element there must be 1 new element. > > > > If I specify minOccurs="0" and maxOccurs="unbounded" does it > > mean that 0-unbounded new elements can be inserted into the > > <sequence> content model? Or, does it mean that: > > > > Before the <Title> element there can be 0-unbounded new > > elements, and > > Before the <Author element there can be 0-unbounded new > > elements, and > > Before the <Date> element there can be 0-unbounded new > > elements, and > > Before the <ISBN> element there can be 0-unbounded new > > elements, and > > Before the <Publisher> element there can be 0-unbounded > > new elements, and > > After the <Publisher> element there can be 0-unbounded new > > elements. > > > > > > Next, suppose I change the mode to 'suffix': > > > > <element name="Book"> > > <complexType> > > <openContent mode="suffix"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > I believe mode="suffix" means that new elements must always > > be placed at the bottom of the <sequence> content model > > (after the <Publisher> element). Correct? > > > > If I specify minOccurs="0" and maxOccurs="1" does it mean > > that 0-1 new elements can be inserted at the bottom of the > > <sequence> content model? > > > > If I specify minOccurs="1" and maxOccurs="1" does it mean > > that 1 new element must be inserted at the bottom of the > > <sequence> content model? > > > > If I specify minOccurs="0" and maxOccurs="unbounded" does it > > mean that 0-unbounded new elements can be inserted at the > > bottom of the <sequence> content model? > > > > > > Lastly, suppose I change the mode to 'none': > > > > <element name="Book"> > > <complexType> > > <openContent mode="none"> > > <any minOccurs="..." maxOccurs="..." /> > > </openContent> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > What does mode="none" mean? Does it mean: > > > > You cannot insert new elements into the <sequence> > content model. > > > > How is it different from this (no openContent specified): > > > > <element name="Book"> > > <complexType> > > <sequence> > > <element name="Title" type="string"/> > > <element name="Author" type="string" /> > > <element name="Date" type="string"/> > > <element name="ISBN" type="string"/> > > <element name="Publisher" type="string"/> > > </sequence> > > </complexType> > > </element> > > > > Are they the same? If they are, why have mode="none"? What's > > its value? > > > > /Roger > >
Received on Friday, 29 May 2009 18:32:51 UTC