- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Mon, 11 Oct 2004 14:56:55 +0100
- To: daniel@veillard.com
- Cc: Jeff Rafter <lists@jeffrafter.com>, xmlschema-dev@w3.org, xml-dev@lists.xml.org
Daniel Veillard <daniel@veillard.com> writes: > On Sun, Oct 10, 2004 at 10:57:32PM +0100, Henry S. Thompson wrote: >> Daniel Veillard <daniel@veillard.com> writes: >> >> > libxml2 regexps used counters since day 1 for min/maxoccurs >> > implementation. The explosion didn't look a supportable >> > alternative to me as it opens the door to trivial DoS attacks or >> > forces to break the schemas validation which is also a big >> > problem if you consider schemas as a contract between two >> > communicating parties. >> >> I would be very interested, as I'm sure would others, in a description >> of your algorithm. > > Well nothing fancy really. You need to add state to the regexp, in > that case the state is the number of time you went through the > transition labelled by the element name:namespace pair. Due to the > single particle rule, it means you have no need to be able to > rollback the state to enter a different transition labelled by the > same name:namespace pair. Which means the effective state required > is purely linear based on the number of counted transition you have > in your regexp: one integer counter is sufficient per such > transition in the regexp runtime structure. It's actually more > automata with state that I'm using than pure regexps. And the > generated constructs for something like x{a,b} involves 3 states > (IIRC) a couple of epsilon transitions and the counted regexp, but > it's not really rocket sience either, very similar to what I was > taught in the classroom. Right, that works fine for exponents on individual elements, but I don't see how it works for groups. Here's a real example from a published schema document [1]: <xsd:sequence minOccurs="0" maxOccurs="1000"> <xsd:element ref="ReferenceIdentification" minOccurs="0"/> <xsd:element ref="Message" minOccurs="0" maxOccurs="1000"/> </xsd:sequence> Or consider a (constructed) case that's tricky in a different way: <xsd:sequence minOccurs="2" maxOccurs="2"> <xsd:element ref="a" minOccurs="1" maxOccurs="2"/> <xsd:element ref="b" minOccurs="0"/> </xsd:sequence> which allows _inter alia_ <a/><a/> <a/><a/><a/> <a/><a/><a/><a/> ht [1] http://www.idealliance.org/spacexml/files/spacexmlv100.zip -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Monday, 11 October 2004 13:57:28 UTC