- From: cedric thienot <cedric.thienot@lip6.fr>
- Date: Thu, 22 Jun 2000 16:59:17 +0200
- To: www-xml-schema-comments@w3.org
- Message-Id: <4.3.1.0.20000622153633.00b14d50@poleia.lip6.fr>
Dear all, I didn't manage to find exactly what is the position of the xml-schema group about the deterministic aspect of a schema. Indeed there was a paragraph about determinism in the XML spec (http://www.w3.org/TR/1998/REC-xml-19980210#determinism) We would like to know your position on 1) the non deterministic models that you can find in a DTD - ((a, b) | (a, c)) - (a*, a) 2) recursive models (a contains b and b contains a). 2) and new ones such as : - a2 is an equivClass of a1 and the content model is : ((a1, b) | (a2, b)) Thank you for your response, Cédric Thiénot. The XML Part about determinism : << Deterministic Content Models (Non-Normative) For compatibility, it is required that content models in element type declarations be deterministic. SGML requires deterministic content models (it calls them "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors. For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the parser cannot know which b in the model is being matched without looking ahead to see which element follows the b. In this case, the two references to b can be collapsed into a single reference, making the model read (b, (c | d)). An initial b now clearly matches only a single name in the content model. The parser doesn't need to look ahead to see what follows; either c or d would be accepted. More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error. Algorithms exist which allow many but not all non-deterministic content models to be reduced automatically to equivalent deterministic models; see Brüggemann-Klein 1991 [Brüggemann-Klein]. >>
Received on Thursday, 22 June 2000 11:03:49 UTC