- From: Daniel Veillard <Daniel.Veillard@imag.fr>
- Date: Sun, 14 Jan 2001 12:39:02 +0100
- To: James Clark <jjc@jclark.com>
- Cc: Daniel.Veillard@imag.fr, "TAKAHASHI Hideo(BSD-13G)" <hideo-t@bisd.hitachi.co.jp>, xml-editor@w3.org, xml-dev@lists.xml.org
On Sun, Jan 14, 2001 at 05:42:16PM +0700, James Clark wrote: > Daniel Veillard wrote: > > > On Sun, Jan 14, 2001 at 04:42:55PM +0900, TAKAHASHI Hideo(BSD-13G) wrote: > > > Hello. > > > > > > I understand that the XML 1.0 spec prohibits non-deterministic (or, > > > ambiguous) content models (for compatibility, to be precise). > > > > Note also that this is stated in a non-normative appendix. > > It is also stated normatively in the body of the spec > (http://www.w3.org/TR/REC-xml#sec-element-content): "For compatibility, > it is an error if an element in the document can match more than one > occurrence of an element type in the content model." Oops, right, I missed this sentence, sorry. > > In practice this is a very good rule because it allows to simplify > > the validation of a content model a lot. > > Complicating things for the user to make things simpler for the parser > writer seems in general a bad trade-off to me. Also this decreases > interoperability (as you've observed). The only justification is > compatibility with SGML. I'm more convinced by the agument that some content models may not be expressable in an 1-unambiguous languages. But I'm still a bit worried that either way interoperability will be a concern. In this case compatibility with SGML generates risk of interoperability problems between XML tools, "results are undefined" doesn't sound good (http://www.w3.org/TR/REC-xml#dt-error). > In fact, there is a very simple algorithm available that handles > non-determinism just fine (it doesn't require you to construct a NFA and > then do the subset construction). See > http://www.flightlab.com/~joe/sgml/validate.html (TREX uses a variation > on this). Thanks for the pointer ! > The thesis proves the opposite: that there are some regular expressions > that do not denote 1-unambiguous languages (see p52). She gives the > example of > > (a|b)*,a,(a|b) Hum, right, this makes sense and is also stated in Appendix E. thanks a lot, Daniel -- Daniel Veillard | Red Hat Network http://redhat.com/products/network/ daniel@veillard.com | libxml Gnome XML toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Received on Sunday, 14 January 2001 06:39:06 UTC