W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > April to June 2000

What about no deterministic aspect?

From: cedric thienot <cedric.thienot@lip6.fr>
Date: Thu, 22 Jun 2000 16:59:17 +0200
Message-Id: <4.3.1.0.20000622153633.00b14d50@poleia.lip6.fr>
To: www-xml-schema-comments@w3.org
Dear all,

I didn't manage to find exactly what is the position of the
xml-schema group about the deterministic aspect of
a schema. Indeed there was a paragraph about determinism
in the XML spec (http://www.w3.org/TR/1998/REC-xml-19980210#determinism)


We would like to know your position on
         1) the non deterministic models that you can find in a DTD
                 - ((a, b) | (a, c))
                 - (a*, a)

         2) recursive models (a contains b and b contains a).

         2) and new ones such as :
                 - a2 is an equivClass of a1
                   and the content model is :
                         ((a1, b) | (a2, b))


Thank you for your response,

Cédric Thiénot.


The XML Part about determinism :

<<      Deterministic Content Models (Non-Normative)
         For compatibility, it is required that content models in element
         type declarations be deterministic. SGML requires deterministic
         content models (it calls them "unambiguous"); XML processors
         built using SGML systems may flag non-deterministic content
         models as errors.
         For example, the content model ((b, c) | (b, d)) is 
non-deterministic,
         because given an initial b the parser cannot know which b in the 
model is
         being matched without looking ahead to see which element follows 
the b.
         In this case, the two references to b can be collapsed into a single
         reference, making the model read (b, (c | d)). An initial b now 
clearly matches
         only a single name in the content model. The parser doesn't need to
         look ahead to see what follows; either c or d would be accepted.
         More formally: a finite state automaton may be constructed from 
the content
         model using the standard algorithms, e.g. algorithm 3.5 in section
         3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such algorithms,
         a follow set is constructed for each position in the regular 
expression (i.e.,
         each leaf node in the syntax tree for the regular expression); if 
any position
         has a follow set in which more than one following position is labeled
         with the same element type name, then the content model is in 
error and
         may be reported as an error.
         Algorithms exist which allow many but not all non-deterministic 
content
         models to be reduced automatically to equivalent deterministic 
models;
         see Brüggemann-Klein 1991 [Brüggemann-Klein].
 >>

                 
Received on Thursday, 22 June 2000 11:03:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:08:47 UTC