- From: Pete Cordell <petexmldev@codalogic.com>
- Date: Thu, 18 Oct 2012 17:45:14 +0100
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: "Michael Kay" <mike@saxonica.com>, <xmlschema-dev@w3.org>
Original Message From: "C. M. Sperberg-McQueen" > On Oct 18, 2012, at 9:05 AM, Pete Cordell wrote: >> Original Message From: "C. M. Sperberg-McQueen" >> >>> - While enumerating 24! or 120! disjuncts in a large regular expression >>> or content model whose top-level operator is choice may >>> exceed current engineering capacity, it is not beyond the wit of >>> human beings or the memory capacity of current machines to >>> construct a finite state automaton for the same language; if there >>> are 23 child elements, of which 3 have the range {1, unbounded}, >>> 4 more have the range {1,1} and 16 have the range {0,1}, the >>> FSA will need 2 ** 23 states. >> >> Do you not also need your FSM to cater for sequences like: >> >> ababababa ... abac >> >> etc? Or does XSD 1.1 require all the 'a's to be together and all the >> 'b's to be together and so on. >> >> The former case looks to require an infinite number of states to me, and >> the latter 23! (25x10^21) states.\ > > I think you're thinking about the number of permutations in the order > of children, and not about the number of states in a finite-state > automaton. I thought we were discussing the practicality of representing the problem in question as a collection of XSD 1.0 compound particles, ultimately in a .xsd file that we could present to an XML schema validator. >>> (This is, of course, the same >>> state machine an XSD 1.1 machine will need to manage the >>> all-group one might write for this content model.) >> >> A better engineering option is to just use 23 counters (possibly even as >> little as 23 bits >> in fact) and implement it in a similar fashion to Ken's suggestion to do >> a xs:choice >> and then an xs:assert checking the occurrence counts were correct. > > Exactly: the state space of the FSA I was referring to requires > 23 bits (plus one bit for an error state, if we want to make it a > complete automaton), with the rule that we start with 23 zeroes, > and increment bit i when we encounter an instance of element i. > The legal transitions are: > > - for any element i, change bit i from zero to one > - for a repeatable element i, change bit i from one to one > - for a non-repeatable element i, if bit i is already one and > another instance of the element is encountered, signal an > error (or: set the error bit and continue). > > The accept states are those bit patterns which have ones for > the seven required elements. While that's a sensible implementation, it doesn't look like an FSA to me. But maybe that's the joy of FSAs - there are so many to choose from! Pete Cordell Codalogic Ltd Twitter: http://twitter.com/petecordell Interface XML to C++ the easy way using C++ XML data binding to convert XSD schemas to C++ classes. Visit http://codalogic.com/lmx/ or http://www.xml2cpp.com for more info
Received on Thursday, 18 October 2012 16:45:42 UTC