- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 18 Oct 2012 10:03:03 -0600
- To: "Pete Cordell" <petexmldev@codalogic.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Michael Kay" <mike@saxonica.com>, <xmlschema-dev@w3.org>
On Oct 18, 2012, at 9:05 AM, Pete Cordell wrote: > Original Message From: "C. M. Sperberg-McQueen" > >> - While enumerating 24! or 120! disjuncts in a large regular expression >> or content model whose top-level operator is choice may >> exceed current engineering capacity, it is not beyond the wit of >> human beings or the memory capacity of current machines to >> construct a finite state automaton for the same language; if there >> are 23 child elements, of which 3 have the range {1, unbounded}, >> 4 more have the range {1,1} and 16 have the range {0,1}, the >> FSA will need 2 ** 23 states. > > Do you not also need your FSM to cater for sequences like: > > ababababa ... abac > > etc? Or does XSD 1.1 require all the 'a's to be together and all the 'b's to be together and so on. > > The former case looks to require an infinite number of states to me, and the latter 23! (25x10^21) states.\ I think you're thinking about the number of permutations in the order of children, and not about the number of states in a finite-state automaton. > >> (This is, of course, the same >> state machine an XSD 1.1 machine will need to manage the >> all-group one might write for this content model.) > > A better engineering option is to just use 23 counters (possibly even as little as 23 bits in fact) and implement it in a similar fashion to Ken's suggestion to do a xs:choice and then an xs:assert checking the occurrence counts were correct. Exactly: the state space of the FSA I was referring to requires 23 bits (plus one bit for an error state, if we want to make it a complete automaton), with the rule that we start with 23 zeroes, and increment bit i when we encounter an instance of element i. The legal transitions are: - for any element i, change bit i from zero to one - for a repeatable element i, change bit i from one to one - for a non-repeatable element i, if bit i is already one and another instance of the element is encountered, signal an error (or: set the error bit and continue). The accept states are those bit patterns which have ones for the seven required elements. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Thursday, 18 October 2012 16:03:53 UTC