W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2012

Re: Order indicators in the .xsd file

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 18 Oct 2012 11:17:48 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Michael Kay" <mike@saxonica.com>, <xmlschema-dev@w3.org>
Message-Id: <48C59A5A-162C-42B4-BE64-86E8A8DBB5DE@blackmesatech.com>
To: "Pete Cordell" <petexmldev@codalogic.com>

On Oct 18, 2012, at 10:45 AM, Pete Cordell wrote:

> Original Message From: "C. M. Sperberg-McQueen"
>> On Oct 18, 2012, at 9:05 AM, Pete Cordell wrote:
>>> Original Message From: "C. M. Sperberg-McQueen"
>>>> - While enumerating 24! or 120! disjuncts in a large regular expression
>>>> or content model whose top-level operator is choice may
>>>> exceed current engineering capacity, it is not beyond the wit of
>>>> human beings or the memory capacity of current machines to
>>>> construct a finite state automaton for the same language; if there
>>>> are 23 child elements, of which 3 have the range {1, unbounded},
>>>> 4 more have the range {1,1} and 16 have the range {0,1}, the
>>>> FSA will need 2 ** 23 states.
>>> Do you not also need your FSM to cater for sequences like:
>>>  ababababa ... abac
>>> etc?  Or does XSD 1.1 require all the 'a's to be together and all the 'b's to be together and so on.
>>> The former case looks to require an infinite number of states to me, and the latter 23! (25x10^21) states.\
>> I think you're thinking about the number of permutations in the order
>> of children, and not about the number of states in a finite-state automaton.
> I thought we were discussing the practicality of representing the problem in question as a collection of XSD 1.0 compound particles, ultimately in a .xsd file that we could present to an XML schema validator.

Sorry if I didn't signal my shift of topic more clearly.

Since content models define regular languages, any XSD content 
model can be represented, with more or less convenience, as a finite 
state automaton; for situations like the one raised by Fred Li, a 
description in FSA terms is much more compact than a 
description in regular-expression terms.

For better or worse, XSD does not require schema components to be
defined in schema-document form; they can be defined in any 
notation the schema processor chooses to accept.  The only proviso
is that at an appropriate level of abstraction they provide the information
described in the specification (which in this case amounts to:
information on how children are bound to element declarations,
and information on whether the parent is locally valid with respect
to its sequence of children).

So -- given an implementation that supports user-supplied
complex types represented as executable code, which may or
may not exist in practice today, but can exist in a conforming
implementation as soon as any implementor chooses to offer it -- 
it would be possible to implement the complex type we are
discussing as an FSA instead of by writing an exceedingly long
and uninformative content model for it using xsd:choice and 

>>>> ...
>>> A better engineering option is to just use 23 counters (possibly even as little as 23 bits
>>> in fact) and implement it in a similar fashion to Ken's suggestion to do a xs:choice
>>> and then an xs:assert checking the occurrence counts were correct.
>> Exactly:  the state space of the FSA I was referring to requires
>> 23 bits (plus one bit for an error state, if we want to make it a
>> complete automaton), ...
>> The accept states are those bit patterns which have ones for
>> the seven required elements.
> While that's a sensible implementation, it doesn't look like an FSA to me. But maybe that's the joy of FSAs - there are so many to choose from!

If you don't see that the counter implementation you propose can
be regarded as an implementation of the FSA I described, I don't 
think I know how to help you.

* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
Received on Thursday, 18 October 2012 17:18:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:16:02 UTC