W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2012

Re: Order indicators in the .xsd file

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 18 Oct 2012 10:03:03 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Michael Kay" <mike@saxonica.com>, <xmlschema-dev@w3.org>
Message-Id: <FF560EA1-96CB-440D-9A56-335AD1BBECE7@blackmesatech.com>
To: "Pete Cordell" <petexmldev@codalogic.com>

On Oct 18, 2012, at 9:05 AM, Pete Cordell wrote:

> Original Message From: "C. M. Sperberg-McQueen"
> 
>> - While enumerating 24! or 120! disjuncts in a large regular expression
>> or content model whose top-level operator is choice may
>> exceed current engineering capacity, it is not beyond the wit of
>> human beings or the memory capacity of current machines to
>> construct a finite state automaton for the same language; if there
>> are 23 child elements, of which 3 have the range {1, unbounded},
>> 4 more have the range {1,1} and 16 have the range {0,1}, the
>> FSA will need 2 ** 23 states.
> 
> Do you not also need your FSM to cater for sequences like:
> 
>   ababababa ... abac
> 
> etc?  Or does XSD 1.1 require all the 'a's to be together and all the 'b's to be together and so on.
> 
> The former case looks to require an infinite number of states to me, and the latter 23! (25x10^21) states.\

I think you're thinking about the number of permutations in the order
of children, and not about the number of states in a finite-state automaton.

> 
>> (This is, of course, the same
>> state machine an XSD 1.1 machine will need to manage the
>> all-group one might write for this content model.)
> 
> A better engineering option is to just use 23 counters (possibly even as little as 23 bits in fact) and implement it in a similar fashion to Ken's suggestion to do a xs:choice and then an xs:assert checking the occurrence counts were correct.

Exactly:  the state space of the FSA I was referring to requires
23 bits (plus one bit for an error state, if we want to make it a 
complete automaton), with the rule that we start with 23 zeroes,
and increment bit i when we encounter an instance of element i.
The legal transitions are:

- for any element i, change bit i from zero to one
- for  a repeatable element i, change bit i from one to one
- for a non-repeatable element i, if bit i is already one and
another instance of the element is encountered, signal an
error (or: set the error bit and continue).

The accept states are those bit patterns which have ones for
the seven required elements.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Thursday, 18 October 2012 16:03:53 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:16:02 UTC