- From: cedric thienot <cedric.thienot@lip6.fr>
- Date: Thu, 22 Jun 2000 16:59:17 +0200
- To: www-xml-schema-comments@w3.org
- Message-Id: <4.3.1.0.20000622153633.00b14d50@poleia.lip6.fr>
Dear all,
I didn't manage to find exactly what is the position of the
xml-schema group about the deterministic aspect of
a schema. Indeed there was a paragraph about determinism
in the XML spec (http://www.w3.org/TR/1998/REC-xml-19980210#determinism)
We would like to know your position on
1) the non deterministic models that you can find in a DTD
- ((a, b) | (a, c))
- (a*, a)
2) recursive models (a contains b and b contains a).
2) and new ones such as :
- a2 is an equivClass of a1
and the content model is :
((a1, b) | (a2, b))
Thank you for your response,
Cédric Thiénot.
The XML Part about determinism :
<< Deterministic Content Models (Non-Normative)
For compatibility, it is required that content models in element
type declarations be deterministic. SGML requires deterministic
content models (it calls them "unambiguous"); XML processors
built using SGML systems may flag non-deterministic content
models as errors.
For example, the content model ((b, c) | (b, d)) is
non-deterministic,
because given an initial b the parser cannot know which b in the
model is
being matched without looking ahead to see which element follows
the b.
In this case, the two references to b can be collapsed into a single
reference, making the model read (b, (c | d)). An initial b now
clearly matches
only a single name in the content model. The parser doesn't need to
look ahead to see what follows; either c or d would be accepted.
More formally: a finite state automaton may be constructed from
the content
model using the standard algorithms, e.g. algorithm 3.5 in section
3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such algorithms,
a follow set is constructed for each position in the regular
expression (i.e.,
each leaf node in the syntax tree for the regular expression); if
any position
has a follow set in which more than one following position is labeled
with the same element type name, then the content model is in
error and
may be reported as an error.
Algorithms exist which allow many but not all non-deterministic
content
models to be reduced automatically to equivalent deterministic
models;
see Brüggemann-Klein 1991 [Brüggemann-Klein].
>>
Received on Thursday, 22 June 2000 11:03:49 UTC