- From: Stefan Wachter <stefan.wachter@gmx.de>
- Date: Sat, 10 Jan 2015 22:32:09 +0100
- To: xmlschema-dev@w3.org
Hi all, I try to implement a validating XML parser based on the XSD recommendation of April 5, 2012 using Scala. Studying the specification I found that the condition for a complex content model to be a valid restriction of a base content model requires that all sequences accepted by the restricted content model are also accepted by the base content model. This requirement turned out to be rather hard to implement. I am not a language theory specialist. Therefore I spent some time googling around and found interesting research papers about the language inclusion problem. Yet, most of them where not directly applicable because of specialities of XML schema (occurence constraints, wildcards). Finally I found a link to an article by Thompson & Tobin (http://www.ltg.ed.ac.uk/~ht/XML_Europe_2003.html) where some algorithms are described in sufficient detail that are meant exactly for the means of XML schema content model validation. After I had implemented these algorithms it turned out, that they were very inefficient for certain schemas. In particular the "XML Schema Test Collection" contains the following schema (msData/particles/particlesIe003.xsd): <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://xsdtesting" xmlns:x="http://xsdtesting" elementFormDefault="qualified"> <xsd:complexType name="base"> <xsd:choice> <xsd:element name="e1" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="e2" minOccurs="0" maxOccurs="unbounded"/> </xsd:choice> </xsd:complexType> <xsd:complexType name="testing"> <xsd:complexContent> <xsd:restriction base="x:base"> <xsd:choice> <xsd:element name="e1" minOccurs="1" maxOccurs="9999999"/> <xsd:element name="e2" minOccurs="1" maxOccurs="9999999"/> </xsd:choice> </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="doc" type="x:testing"/> </xsd:schema> It is impractical to validate this schema using the algorithms presented in the mentioned article. The constructed state machines have millions of states and take hours to be analyzed. Can anyone give me some help, how content model validation can be implemented for such cases? Should I fall back to the specification given in XSD 1.0? Thanks for your attention & regards! Stefan
Received on Saturday, 10 January 2015 21:32:39 UTC