W3C home > Mailing lists > Public > xmlschema-dev@w3.org > January 2002

RE: Substitution groups and Parsing Efficiency

From: Mark Feblowitz <mfeblowitz@frictionless.com>
Date: Tue, 8 Jan 2002 08:49:58 -0500
Message-ID: <4DBDB4044ABED31183C000508BA0E97F024D5696@fcpostal.frictionless.com>
To: "'ht@cogsci.ed.ac.uk'" <ht@cogsci.ed.ac.uk>
Cc: "Xmlschema-Dev (E-mail)" <xmlschema-dev@w3.org>
Excellent. I was hoping as much. 

That answers the execution performance question. Do these constructs also
have efficient internal representations in terms of their memory
consumption? Is the code for processing a substitution group compact? Is it
factored or replicated?



Mark Feblowitz                                   [t] 617.715.7231
Frictionless Commerce Incorporated     [f] 617.495.0188 
XML Architect                                     [e]
400 Technology Square, 9th Floor 
Cambridge, MA 02139 

 -----Original Message-----
From: 	ht@cogsci.ed.ac.uk [mailto:ht@cogsci.ed.ac.uk] 
Sent:	Tuesday, January 08, 2002 4:03 AM
To:	Mark Feblowitz
Cc:	Xmlschema-Dev (E-mail)
Subject:	Re: Substitution groups and Parsing Efficiency

Mark Feblowitz <mfeblowitz@frictionless.com> writes:

> While (ab)using a particular substitution group, I noticed in the XSV
> report that the fsm for the substitution group was huge.
> The substitution group is indeed large (many possible substitutions), and
> referenced in many places, with a cardinality of 0..unbounded. This means
> that any number of the substitutions could appear any number of times, and
> in any order. This, indeed, would lead to a large fsm - in fact to an
> (with a non-finite set of potential states).
> My question is, do schema validators typically have efficient algorithms
> handling such cases? How about cases that are finite but large?

The REC mandates that a substitution group head be treated as a
disjunction over its members.  It follows that a 0..unbounded s-g head
is no different than a 0..unbounded xs:choice of all its members.  The
fsm fragment for such a choice is straight-forward, and certainly
never infinite.  The UPA guarantees that there are no unbounded
multiplicative effects of having several such fragments in the overall
fsm for a content model: at worst every edge following such a fragment
has to be duplicated once.  For example the regexp corresponding to
the determinised fsm for (a|b)*(c|d)*e is (a|b)*(e|((c|d)(c|d)*e)).

  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2001, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
Received on Tuesday, 8 January 2002 08:50:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:55:54 UTC