- From: Mark Feblowitz <mfeblowitz@frictionless.com>
- Date: Tue, 8 Jan 2002 08:49:58 -0500
- To: "'ht@cogsci.ed.ac.uk'" <ht@cogsci.ed.ac.uk>
- Cc: "Xmlschema-Dev (E-mail)" <xmlschema-dev@w3.org>
Excellent. I was hoping as much. That answers the execution performance question. Do these constructs also have efficient internal representations in terms of their memory consumption? Is the code for processing a substitution group compact? Is it factored or replicated? Thanks, Mark ---------------------------------------------------------------------------- ---- Mark Feblowitz [t] 617.715.7231 Frictionless Commerce Incorporated [f] 617.495.0188 XML Architect [e] mfeblowitz@frictionless.com 400 Technology Square, 9th Floor Cambridge, MA 02139 www.frictionless.com -----Original Message----- From: ht@cogsci.ed.ac.uk [mailto:ht@cogsci.ed.ac.uk] Sent: Tuesday, January 08, 2002 4:03 AM To: Mark Feblowitz Cc: Xmlschema-Dev (E-mail) Subject: Re: Substitution groups and Parsing Efficiency Mark Feblowitz <mfeblowitz@frictionless.com> writes: > While (ab)using a particular substitution group, I noticed in the XSV error > report that the fsm for the substitution group was huge. > > The substitution group is indeed large (many possible substitutions), and is > referenced in many places, with a cardinality of 0..unbounded. This means > that any number of the substitutions could appear any number of times, and > in any order. This, indeed, would lead to a large fsm - in fact to an "nfsm" > (with a non-finite set of potential states). > > My question is, do schema validators typically have efficient algorithms for > handling such cases? How about cases that are finite but large? The REC mandates that a substitution group head be treated as a disjunction over its members. It follows that a 0..unbounded s-g head is no different than a 0..unbounded xs:choice of all its members. The fsm fragment for such a choice is straight-forward, and certainly never infinite. The UPA guarantees that there are no unbounded multiplicative effects of having several such fragments in the overall fsm for a content model: at worst every edge following such a fragment has to be duplicated once. For example the regexp corresponding to the determinised fsm for (a|b)*(c|d)*e is (a|b)*(e|((c|d)(c|d)*e)). ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2001, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/
Received on Tuesday, 8 January 2002 08:50:37 UTC