- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: 14 Jun 2002 09:34:54 +0100
- To: Tom Moog <tmoog@sarvega.com>
- Cc: xmlschema-dev@w3.org
Tom Moog <tmoog@sarvega.com> writes:
> Having a parser which recognizes a sequence is valid is not
> enough to indicate how it should be processed. One also
> needs to know the parse tree in many circumstances. There
> may be semantic actions associated with particular elements
> and sequences.
True, but SGML and XML have traditionally _never_ exposed that
information. Style guides for these languages advise against
anonymous groups in content models for precisely this reason -- they
indicate the existence of structure of potential semantic significance
which the application _will not have access to_.
> For example, 1+2*3 could be either (1+2)*3 or 1+(2*3) depending
> on the way it is parsed.
Yup, and that's why your schema should delimit terms with markup.
> By building a DFA to recognize the content, the answer of
> whether it is valid or not is, in a sense, not known until the
> very end when it reaches an accepte/reject state. This
> means that no semantic actions could be invoked until
> the entire sequence is accepted.
Yup.
> Unique parse would make it an LL(1) gramar ?
Close, IIRC. There's a modest literature out there about SGML from
the formal language theory perspective.
ht
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
W3C Fellow 1999--2002, part-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Friday, 14 June 2002 04:34:58 UTC