Re: should all XML parsers reject non-deterministic content models? from Daniel Veillard on 2001-01-14 (xml-editor@w3.org from January to March 2001)

From: Daniel Veillard <Daniel.Veillard@imag.fr>
Date: Sun, 14 Jan 2001 10:04:58 +0100
To: "TAKAHASHI Hideo(BSD-13G)" <hideo-t@bisd.hitachi.co.jp>, xml-editor@w3.org
Cc: xml-dev@lists.xml.org
Message-ID: <20010114100458.B26487@imag.fr>

On Sun, Jan 14, 2001 at 04:42:55PM +0900, TAKAHASHI Hideo(BSD-13G) wrote:
> Hello.
> 
> I understand that the XML 1.0 spec prohibits non-deterministic (or,
> ambiguous) content models (for compatibility, to be precise).

  Note also that this is stated in a non-normative appendix.

> Are all xml 1.0 compliant xml processing software required to reject
> DTDs with such content models?

  Since it is stated as non-normatively only I don't think this is the
case in theory.
  In prectice this can be a problem. I recently faced a problem with
a DtD developped at the IETF which was clearly non-determinist. This
also means that this introduce new classes of XML parser among the
validating ones:
   - those who detect and report non-determinist content model
   - those who validate (correctly) or not using non-determinist
     content model

> Ambiguous content models doesn't cause any problems when you construct a
> DFA via an NFA.  I have heard that there is a way to construct DFAs
> directly from regexps without making an NFA, but that method can't
> handle non-deterministic regular expressions.  If you choose that method
> to construct your DFA, you will surely benefit from the rule in XML 1.0
> . But if you choose not, detecting non-deterministic content models
> become an extra job.

  I tried to read the Brüggemann-Klein thesis listed in reference and
found it a bit frightening, though very informative. The beginning
of the Part I on Document Grammar for example makes clear that SGML
view of unambiguity of the content model is really a 1 token lookahead
determinism.
  In practice this is a very good rule because it allows to simplify
the validation of a content model a lot. Problem is that grammars
need to be rewritten to conform to it (the thesis proves it's always
possible at lest).

> I can see that parsers that allow non-deterministic content models may
> be harmful to the user.  The user won't notice that his DTD may be
> rejected by other parsers.
> 
> So there seems to be good reason for the XML 1.0 spec to prohibit
> parsers that accept non-deterministic content models.  In that case the
> spec not only gives chance for a particular DFA constructing algorithm
> to be used, but effectively recommends the usage of the algorithm.

  As usual, such suggestions should also be provided to the spec comment
list so I'm forwarding it to xml-editor@w3.org,

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
daniel@veillard.com  | libxml Gnome XML toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Received on Sunday, 14 January 2001 04:05:07 UTC