- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: 14 Jun 2002 09:34:54 +0100
- To: Tom Moog <tmoog@sarvega.com>
- Cc: xmlschema-dev@w3.org
Tom Moog <tmoog@sarvega.com> writes: > Having a parser which recognizes a sequence is valid is not > enough to indicate how it should be processed. One also > needs to know the parse tree in many circumstances. There > may be semantic actions associated with particular elements > and sequences. True, but SGML and XML have traditionally _never_ exposed that information. Style guides for these languages advise against anonymous groups in content models for precisely this reason -- they indicate the existence of structure of potential semantic significance which the application _will not have access to_. > For example, 1+2*3 could be either (1+2)*3 or 1+(2*3) depending > on the way it is parsed. Yup, and that's why your schema should delimit terms with markup. > By building a DFA to recognize the content, the answer of > whether it is valid or not is, in a sense, not known until the > very end when it reaches an accepte/reject state. This > means that no semantic actions could be invoked until > the entire sequence is accepted. Yup. > Unique parse would make it an LL(1) gramar ? Close, IIRC. There's a modest literature out there about SGML from the formal language theory perspective. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2002, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Friday, 14 June 2002 04:34:58 UTC