W3C home > Mailing lists > Public > xmlschema-dev@w3.org > June 2002

Re: Ambiguous content models -- allowed or disallowed by XSDL?

From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
Date: 14 Jun 2002 09:34:54 +0100
To: Tom Moog <tmoog@sarvega.com>
Cc: xmlschema-dev@w3.org
Message-ID: <f5bit4m1cnl.fsf@cogsci.ed.ac.uk>

Tom Moog <tmoog@sarvega.com> writes:

> Having a parser which recognizes a sequence is valid is not
> enough to indicate how it should be processed.   One also
> needs to know the parse tree in many circumstances.   There
> may be semantic actions associated with particular elements
> and sequences.

True, but SGML and XML have traditionally _never_ exposed that
information.  Style guides for these languages advise against
anonymous groups in content models for precisely this reason -- they
indicate the existence of structure of potential semantic significance 
which the application _will not have access to_.

> For example, 1+2*3 could be either (1+2)*3 or 1+(2*3) depending
> on the way it is parsed.

Yup, and that's why your schema should delimit terms with markup.

> By building a DFA to recognize the content, the answer of
> whether it is valid or not is, in a sense, not known until the
> very end when it reaches an accepte/reject state.  This
> means that no semantic actions could be invoked until
> the entire sequence is accepted.


> Unique parse would make it an LL(1) gramar ?

Close, IIRC.  There's a modest literature out there about SGML from
the formal language theory perspective.

  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Friday, 14 June 2002 04:34:58 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:55:57 UTC