W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2012

Re: Order indicators in the .xsd file

From: Pete Cordell <petexmldev@codalogic.com>
Date: Thu, 18 Oct 2012 17:45:14 +0100
Message-ID: <A46B6D031CF344A4AC448A4AB4789021@codalogic>
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: "Michael Kay" <mike@saxonica.com>, <xmlschema-dev@w3.org>
Original Message From: "C. M. Sperberg-McQueen"
> On Oct 18, 2012, at 9:05 AM, Pete Cordell wrote:
>> Original Message From: "C. M. Sperberg-McQueen"
>>
>>> - While enumerating 24! or 120! disjuncts in a large regular expression
>>> or content model whose top-level operator is choice may
>>> exceed current engineering capacity, it is not beyond the wit of
>>> human beings or the memory capacity of current machines to
>>> construct a finite state automaton for the same language; if there
>>> are 23 child elements, of which 3 have the range {1, unbounded},
>>> 4 more have the range {1,1} and 16 have the range {0,1}, the
>>> FSA will need 2 ** 23 states.
>>
>> Do you not also need your FSM to cater for sequences like:
>>
>>   ababababa ... abac
>>
>> etc?  Or does XSD 1.1 require all the 'a's to be together and all the 
>> 'b's to be together and so on.
>>
>> The former case looks to require an infinite number of states to me, and 
>> the latter 23! (25x10^21) states.\
>
> I think you're thinking about the number of permutations in the order
> of children, and not about the number of states in a finite-state 
> automaton.

I thought we were discussing the practicality of representing the problem in 
question as a collection of XSD 1.0 compound particles, ultimately in a .xsd 
file that we could present to an XML schema validator.

>>> (This is, of course, the same
>>> state machine an XSD 1.1 machine will need to manage the
>>> all-group one might write for this content model.)
>>
>> A better engineering option is to just use 23 counters (possibly even as 
>> little as 23 bits
>> in fact) and implement it in a similar fashion to Ken's suggestion to do 
>> a xs:choice
>> and then an xs:assert checking the occurrence counts were correct.
>
> Exactly:  the state space of the FSA I was referring to requires
> 23 bits (plus one bit for an error state, if we want to make it a
> complete automaton), with the rule that we start with 23 zeroes,
> and increment bit i when we encounter an instance of element i.
> The legal transitions are:
>
> - for any element i, change bit i from zero to one
> - for  a repeatable element i, change bit i from one to one
> - for a non-repeatable element i, if bit i is already one and
> another instance of the element is encountered, signal an
> error (or: set the error bit and continue).
>
> The accept states are those bit patterns which have ones for
> the seven required elements.

While that's a sensible implementation, it doesn't look like an FSA to me. 
But maybe that's the joy of FSAs - there are so many to choose from!

Pete Cordell
Codalogic Ltd
Twitter: http://twitter.com/petecordell
Interface XML to C++ the easy way using C++ XML
data binding to convert XSD schemas to C++ classes.
Visit http://codalogic.com/lmx/ or http://www.xml2cpp.com
for more info
Received on Thursday, 18 October 2012 16:45:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:16:02 UTC