- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: 13 Jun 2002 09:44:07 +0100
- To: Ian Stokes-Rees <ijs@decisionsoft.com>
- Cc: xmlschema-dev@w3.org
I'd be happy to look at specific cases where there's debate in your group and/or disagreement among validators -- I'm not prepared to go on a fishing expedition through all your schema docs. I'm sure others on the list will help as well. As for ambiguity, it's crucial to distinguish between unique attribution and ambiguity. Neither SGML DTDs, nor XML DTDs, nor W3C XML Schema, have ever ruled out ambiguous content models as such. They've all required some version of unique attribution. Ambiguity means more than one possible analysis of the input is possible wrt the content model. Unique attribution requires online (i.e. without lookahead) unequivocal determination of which particle to try. So for UPA to be violated, there _must_ be two particles which could allow some element, i.e. two element decls with the same name, or one element decl and an overlapping wildcard, or two overlapping wildcards. A content model without any of those _can't_ violate UPA. So although UPA rules out many many ambiguous models, it doesn't rule them all out. First of all, all the ones allowed by SGML and XML are allowed, i.e. (a+)*, (a*)+, etc. Second of all, the ones created by the advent of numeric exponents are allowed, i.e. <sequence minOccurs='2' maxOccurs='3'> <element name='a' minOccurs='2' maxOccurs='3'/> </sequence> which gives two analyses for a sequence of six <a>s. Summary: Ambiguity and unique attribution are different -- the *ML family have never ruled out the former, always required the latter. I hope it's clear _why_ this is reasonable: unique attribution matters -- without it, parsers have to handle arbitrary parallelism and/or backtracking, and content model subsumption checking would be much harder (exponentially harder). Ambiguity doesn't matter -- it's easy to build a parser that accepts all and only valid content, without wasting effort, and since there's no way to exploit which analysis was used, the fact that both are not available doesn't matter. Hope this helps, ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2002, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Thursday, 13 June 2002 04:44:11 UTC