- From: Stefan Wachter <stefan.wachter@gmx.de>
- Date: Sat, 10 Jan 2015 22:32:09 +0100
- To: xmlschema-dev@w3.org
Hi all,
I try to implement a validating XML parser based on the XSD
recommendation of April 5, 2012 using Scala. Studying the specification
I found that the condition for a complex content model to be a valid
restriction of a base content model requires that all sequences accepted
by the restricted content model are also accepted by the base content model.
This requirement turned out to be rather hard to implement. I am not a
language theory specialist. Therefore I spent some time googling around
and found interesting research papers about the language inclusion
problem. Yet, most of them where not directly applicable because of
specialities of XML schema (occurence constraints, wildcards). Finally I
found a link to an article by Thompson & Tobin
(http://www.ltg.ed.ac.uk/~ht/XML_Europe_2003.html) where some algorithms
are described in sufficient detail that are meant exactly for the means
of XML schema content model validation.
After I had implemented these algorithms it turned out, that they were
very inefficient for certain schemas. In particular the "XML Schema Test
Collection" contains the following schema
(msData/particles/particlesIe003.xsd):
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://xsdtesting" xmlns:x="http://xsdtesting"
elementFormDefault="qualified">
<xsd:complexType name="base">
<xsd:choice>
<xsd:element name="e1" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="e2" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:choice>
</xsd:complexType>
<xsd:complexType name="testing">
<xsd:complexContent>
<xsd:restriction base="x:base">
<xsd:choice>
<xsd:element name="e1"
minOccurs="1" maxOccurs="9999999"/>
<xsd:element name="e2"
minOccurs="1" maxOccurs="9999999"/>
</xsd:choice>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="doc" type="x:testing"/>
</xsd:schema>
It is impractical to validate this schema using the algorithms presented
in the mentioned article. The constructed state machines have millions
of states and take hours to be analyzed.
Can anyone give me some help, how content model validation can be
implemented for such cases? Should I fall back to the specification
given in XSD 1.0?
Thanks for your attention & regards!
Stefan
Received on Saturday, 10 January 2015 21:32:39 UTC