RE: infinite loop from C. M. Sperberg-McQueen on 2003-02-07 (xmlschema-dev@w3.org from February 2003)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Fri, 07 Feb 2003 14:42:16 -0700
To: <allan.jones@hyfinity.com>
Cc: "xml schema" <xmlschema-dev@w3.org>
Message-Id: <5.1.0.14.1.20030207141250.03640778@localhost>
[Pardon the very late reply; I hope this may still be of interest
to someone.]

At 2003-01-10 09:13, Allan Jones wrote:

 > Hi Jeni,

 > thanks for your suggestion, but the problem with that technique is
 > that you can declare sub-elements with the same name, which would
 > effectively mean that you'd declare something like:

 > <test>
 > 	<element>
 > 		<element>
 > 			<terminal_element/>
 > 		</element>
 > 	</element>
 > </test>

 > as an infinite loop when the schema would merely specify it as a
 > nested element.

Well, no.  Jeni suggested noting element declarations, not element
generic identifiers; if the element instances denoted by /test/element
and test/element/element are instances of the same element
declaration, then the schema does indeed show a cycle in the schema;
if the instance is valid, then the content model of 'element' can
clearly be either an 'element' element or a 'terminal_element'
element, so the cycle is not pernicious.

 > Using the complex type definitions (as they're usually the reason
 > that an infinite loop occurs) leads to similar problems, because
 > it's possible that the same complex type could be used in different
 > parts of the document so the fact that you've 'hit' it twice doesn't
 > mean that it's an infinite loop.

If an instance of a type can contain another instance of the same
type, then arbitrarily deep nesting is possible.  If the instance MUST
contain a second instance of the same type, then infinitely deep
nesting is required and the content model is unsatisfiable.  (Content
models of this kind have been called Elephant Child content models for
this reason; the Elephant Child, you will recall, suffered from
insatiability.)

I think by 'infinite loop' you mean not just a set of declarations
which allow arbitrarily deep nesting, but a set which are insatiable
because they require infinitely deep nesting.

As Jeni pointed out, to find Elephant Children you need to keep track
of whether each descendant is required or not.  This is not actually
very hard for sequences (though I agree it's fiddly); it's somewhat
harder for choices.  I'm not quite sure how best to go about detecting
vicious cycles in schema documents using XSLT, but your original
question just said 'programmatically', so I'll sketch a strategy one
could use.

The key characteristic of an Elephant Child type T is that it is
defined in such a way that the content of every instance of T must
contain another instance of T.  I'd find such types by attempting to
prove that a type is not an Elephant Child.  If I can't prove it, then
it is an Elephant Child.  The proof strategy involves the predicate
ec_free(X), which applies to types, sequences, choices, all-groups,
and element particles.

Applied to types:

ec_free(T) is:
     - if T is a simple type, true.
     - if T is a complex type, true if the topmost group of T is
       ec_free.

Applied to groups, ec_free(G):

     - if G has minOccurs = 0, then true
     - else if G is a sequence (a, b, ... z), then true iff
       a, b, ... z are all ec_free.
     - else if G is a choice (a | b | ... | z), then true iff
       at least one of a, b, ... z is ec_free.
     - else if G is an all-group (a & b & ... & z) then true iff
       a, b, ... z are all ec_free.

Applied to element particles, ec_free(E):
     - if E has minOccurs = 0, then true
     - else if E has complex type T, and we are in the process of
       trying to prove whether T is ec_free, then false.
     - else if E has complex type T, and we are not already
       trying to prove whether T is ec_free, then true iff
       ec_free(T)

Does this help?

-C. M. Sperberg-McQueen
Received on Friday, 7 February 2003 17:58:48 UTC