- From: Daniel Veillard <Daniel.Veillard@imag.fr>
- Date: Sun, 14 Jan 2001 10:04:58 +0100
- To: "TAKAHASHI Hideo(BSD-13G)" <hideo-t@bisd.hitachi.co.jp>, xml-editor@w3.org
- Cc: xml-dev@lists.xml.org
On Sun, Jan 14, 2001 at 04:42:55PM +0900, TAKAHASHI Hideo(BSD-13G) wrote: > Hello. > > I understand that the XML 1.0 spec prohibits non-deterministic (or, > ambiguous) content models (for compatibility, to be precise). Note also that this is stated in a non-normative appendix. > Are all xml 1.0 compliant xml processing software required to reject > DTDs with such content models? Since it is stated as non-normatively only I don't think this is the case in theory. In prectice this can be a problem. I recently faced a problem with a DtD developped at the IETF which was clearly non-determinist. This also means that this introduce new classes of XML parser among the validating ones: - those who detect and report non-determinist content model - those who validate (correctly) or not using non-determinist content model > Ambiguous content models doesn't cause any problems when you construct a > DFA via an NFA. I have heard that there is a way to construct DFAs > directly from regexps without making an NFA, but that method can't > handle non-deterministic regular expressions. If you choose that method > to construct your DFA, you will surely benefit from the rule in XML 1.0 > . But if you choose not, detecting non-deterministic content models > become an extra job. I tried to read the Brüggemann-Klein thesis listed in reference and found it a bit frightening, though very informative. The beginning of the Part I on Document Grammar for example makes clear that SGML view of unambiguity of the content model is really a 1 token lookahead determinism. In practice this is a very good rule because it allows to simplify the validation of a content model a lot. Problem is that grammars need to be rewritten to conform to it (the thesis proves it's always possible at lest). > I can see that parsers that allow non-deterministic content models may > be harmful to the user. The user won't notice that his DTD may be > rejected by other parsers. > > So there seems to be good reason for the XML 1.0 spec to prohibit > parsers that accept non-deterministic content models. In that case the > spec not only gives chance for a particular DFA constructing algorithm > to be used, but effectively recommends the usage of the algorithm. As usual, such suggestions should also be provided to the spec comment list so I'm forwarding it to xml-editor@w3.org, Daniel -- Daniel Veillard | Red Hat Network http://redhat.com/products/network/ daniel@veillard.com | libxml Gnome XML toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Received on Sunday, 14 January 2001 04:05:07 UTC