W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2004

Re: [xml-dev] New release (2.8) of XSV

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Mon, 11 Oct 2004 14:56:55 +0100
To: daniel@veillard.com
Cc: Jeff Rafter <lists@jeffrafter.com>, xmlschema-dev@w3.org, xml-dev@lists.xml.org
Message-ID: <f5b3c0ljr2w.fsf@erasmus.inf.ed.ac.uk>

Daniel Veillard <daniel@veillard.com> writes:

> On Sun, Oct 10, 2004 at 10:57:32PM +0100, Henry S. Thompson wrote:
>> Daniel Veillard <daniel@veillard.com> writes:
>> 
>> > libxml2 regexps used counters since day 1 for min/maxoccurs
>> > implementation.  The explosion didn't look a supportable
>> > alternative to me as it opens the door to trivial DoS attacks or
>> > forces to break the schemas validation which is also a big
>> > problem if you consider schemas as a contract between two
>> > communicating parties.
>> 
>> I would be very interested, as I'm sure would others, in a description
>> of your algorithm.
>
> Well nothing fancy really. You need to add state to the regexp, in
> that case the state is the number of time you went through the
> transition labelled by the element name:namespace pair. Due to the
> single particle rule, it means you have no need to be able to
> rollback the state to enter a different transition labelled by the
> same name:namespace pair.  Which means the effective state required
> is purely linear based on the number of counted transition you have
> in your regexp: one integer counter is sufficient per such
> transition in the regexp runtime structure. It's actually more
> automata with state that I'm using than pure regexps.  And the
> generated constructs for something like x{a,b} involves 3 states
> (IIRC) a couple of epsilon transitions and the counted regexp, but
> it's not really rocket sience either, very similar to what I was
> taught in the classroom.

Right, that works fine for exponents on individual elements, but I
don't see how it works for groups.

Here's a real example from a published schema document [1]:

<xsd:sequence minOccurs="0" maxOccurs="1000">
    <xsd:element ref="ReferenceIdentification" minOccurs="0"/>
    <xsd:element ref="Message" minOccurs="0" maxOccurs="1000"/>
</xsd:sequence>

Or consider a (constructed) case that's tricky in a different way:

<xsd:sequence minOccurs="2" maxOccurs="2">
    <xsd:element ref="a" minOccurs="1" maxOccurs="2"/>
    <xsd:element ref="b" minOccurs="0"/>
</xsd:sequence>

which allows _inter alia_

 <a/><a/>
 <a/><a/><a/>
 <a/><a/><a/><a/>

ht
 
[1] http://www.idealliance.org/spacexml/files/spacexmlv100.zip
-- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Monday, 11 October 2004 13:57:28 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:11 UTC