RE: Ignore Order while validating XSD from Ralf Lammel on 2006-02-20 (xmlschema-dev@w3.org from February 2006)

From: Ralf Lammel <Ralf.Lammel@microsoft.com>
Date: Sun, 19 Feb 2006 16:56:45 -0800
To: "Mukul Gandhi" <gandhi.mukul@gmail.com>, "Ramkumar Menon" <ramkumar.menon@gmail.com>
Cc: <xmlschema-dev@w3.org>
Message-ID: <1152E22EE8996742A7E36BBBA7768FEE08B12883@RED-MSG-50.redmond.corp.microsoft.com>

Mukul wrote:

> If you have 10 or more elements,
> then definitely the approach I suggested would be cumbersome.

SET FUN=ON

"cumbersome" is a proper understatement. The number of permutations over a finite set S is |S|!. We would get that many sequence compositors if we used the naïve, UPA-violating approach, plus 1 choice compositor. Just to recall figures:

10! + 1 = 3,628,801

I doubt that there is that much business-critical XSD in this world, as of writing. The biggest real-world schema that I have seen has less than 20,000 compositors -- a factor 181 difference.

But then again, naïve enumeration of permutations is not valid XSD anyhow. One would hope that left factorization recovers scalability a bit, but it is the other way around. Let s_i denote the number of sequence compositors for the left-factorization approach; let c_i denote the number of choice compositors for the left-factorization approach, where i denotes the number of element particles to be varied. Here are some observations to be made by looking at the examples we have seen in earlier emails:

- s_1 = 1, c_1 = 0
- s_2 = 2, c_2 = 1
- s_3 = 6, c_3 = 4

Here are the progressions:

s_1 = 1
s_{i+1} = (i+1) * s_i

c_1 = 0
c_{i+1} = (i+1) * c_i + 1

Hence, just the s_i part is the factorial alone.
And c_i's growth isn't much less that s_i's.

I think it is an advanced homework question to figure out an XSD representation scheme for permutation phrases (without using xs:all) that uses less "compositors in code" than the plain left-factorization approach. One could try to further factor out commonalities into group definitions, or what? Whatever you find, the representation scheme will be *cumbersome* for 4, masochistic for 5, academic from there on, and certainly require a good hard disk for 10. Someone might even propose to write a generator that implements the scheme. Then, two permutation phrases over 7 particles would generate more or less as much code as the size of the biggest real-world schema that I have seen so far.

SET FUN=OFF

We should also note that manual elimination of permutation phrases gets slightly more complicated once minOccurs=0 constraints were involved. My very personal impression is that an XSD 1.0 follow-up should either get rid of xs:all or define it in some generality. Remember xs:all isn't used a lot in real-world schemas. In the schema study of [1] we found that all the 63 schemas used 40 xs:all's (0.07% of all compositors that we encountered).

Ralf

[1] Analysis of XML schema usage, http://homepages.cwi.nl/~ralf/xml05/

> -----Original Message-----
> From: xmlschema-dev-request@w3.org [mailto:xmlschema-dev-request@w3.org]
> On Behalf Of Mukul Gandhi
> Sent: Sunday, February 19, 2006 9:35 AM
> To: Ramkumar Menon
> Cc: xmlschema-dev@w3.org
> Subject: Re: Ignore Order while validating XSD
> 
> 
> I am not sure how difficult it is to support this scenario within XSD.
> The XSD validator vendors are the right people to comment on this.
> 
> The XML Schema spec says "The choice group element allows only one of
> its children to appear in an instance". So I think the XSD validator
> should support the example I posted. If you have 10 or more elements,
> then definitely the approach I suggested would be cumbersome.
> 
> Presently the XML Schema spec says following about the all group "All
> the elements in the group may appear once or not at all, and they may
> appear in any order". I think its a good idea to remove the
> restriction of "once or not at all", and allow any number of
> instances.
> 
> Regards,
> Mukul
> 
> On 2/19/06, Ramkumar Menon <ramkumar.menon@gmail.com> wrote:
> > Thanks, Mukul.
> > Is there any special reason why such a simple scenario is so complex to
> > support witihn XSD ? The suggestion below definitely seems good, but a
> > schema with 10 or more elements [which is the case with my scenario]
> looks
> > very complex - visually at least - and loses readability.
> > rgds,
> > Menon

Received on Monday, 20 February 2006 00:57:35 UTC