W3C home > Mailing lists > Public > xmlschema-dev@w3.org > June 2007

RE: validator disparity

From: Michael Kay <mike@saxonica.com>
Date: Thu, 14 Jun 2007 16:53:00 +0100
To: "'Paul Warren'" <pdw@decisionsoft.com>, <bud@syndafeed.com>
Cc: <xmlschema-dev@w3.org>, <tools@decisionsoft.com>
Message-ID: <014e01c7ae9c$194ce730$6401a8c0@turtle>

There's a long history here, only a small part of which is captured at:

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1889

In fact the current spec makes three statements within a few lines of each
other, none of which agrees with the others, and there are no clues as to
which one takes precedence:

(1)
[17]   	charRange	   ::=   	 seRange | XmlCharIncDash  

which says that "-" is always a valid character-range

(2)
The [, ], - and \ characters are not valid character ranges;

which says that "-" can't be a character range

(3)
The - character is a valid character range only at the beginning or end of a
.positive character group..

which says it's sometimes valid and sometimes isn't (and says it in a very
odd way, because how do you know whether you're at the end of a positive
character group, especially where subtraction is involved?)

In my current implementation in Saxon I decided to allow "-" anywhere within
a character range, interpreting it as representing itself except in a
context where it can be interpreted as a range operator [a-z] or a
subtraction operator [\p{Lu}-[AEIOU]].

Users would be well-advised to steer clear of this and escape the "-"
everywhere.

Michael Kay
http://www.saxonica.com/

 

> -----Original Message-----
> From: xmlschema-dev-request@w3.org 
> [mailto:xmlschema-dev-request@w3.org] On Behalf Of Paul Warren
> Sent: 14 June 2007 15:43
> To: bud@syndafeed.com
> Cc: xmlschema-dev@w3.org; tools@decisionsoft.com
> Subject: Re: validator disparity
> 
> 
> Hi Bud,
> 
> I should point out that our schema validation service is 
> simply a web frontend to the Xerces-J Schema Validator (this 
> used to be made clear on the web page, but it seems that 
> notice has gone AWOL - I'll get that fixed).
> 
> A quick look at the schema spec suggests that Xerces is wrong on this
> point:
> 
> "The - character is a valid character range only at the 
> beginning or end of a .positive character group.."
> 
> regards,
> 
> Paul
> 
> 
> On 14 Jun 2007, at 15:26, Bud Hovell wrote:
> 
> > Hi, folks ...
> >
> > I've run across a bit of a puzzle and thought I'd at least 
> report it 
> > for examination by others technically qualified.
> >
> > In the course of validating a test output file found at http:// 
> > www.amexpat.com/primeloc.xml, I discovered both it and the schema 
> > validate without complaint on the W2C validator at http:// 
> > www.w3.org/2001/03/webdata/xsv, but the schema does not validate at 
> > http://tools.decisionsoft.com/schemaValidate/, which offers the 
> > following complaint:
> > ================== OUTPUT ===================== XML Schema Validator
> >
> > Well Formed: VALID
> > Schema Validation: INVALID
> >
> > The following errors were found:
> > TYPELOCMESSAGE
> > Validation 128, 38InvalidRegex: Pattern value '[-0-9]*' is 
> not a valid 
> > regular expression. The reported error was: ''-' is an invalid 
> > character range. Write '\-'.'.
> > Validation 134, 38InvalidRegex: Pattern value '[-0-9+ ()]*' 
> is not a 
> > valid regular expression. The reported error was: ''-' is 
> an invalid 
> > character range. Write '\-'.'.
> >
> > ================ END OUTPUT ===============
> >
> > ... evidently because the non-range-denoting "-" character 
> is shown in 
> > the first position of the pattern match in brackets rather 
> than last.  
> > I'm not acquainted with the specific rules for schema 
> validation, but 
> > seem to recall that most regex matching rules DO require a literal 
> > naked dash to be mentioned last.  In this case, the parser 
> evidently 
> > wants to see it backslashed so it is understood to denote a literal 
> > rather than a range.
> >
> > This is the text of the two relevant blocks in the 
> 2007-05-21 schema 
> > file (attached in full) which I received from the provider and have 
> > input for testing at DecisionSoft:
> >
> >         <xsd:simpleType name="integerOrNull_Type">
> >                 <xsd:restriction base="xsd:string">
> >                         <xsd:pattern value="[-0-9]*"/>
> >                 </xsd:restriction>
> >         </xsd:simpleType>
> >         <!-- Telephone can contains numbers, spaces, 
> brackets, +'s and 
> > -'s /-->
> >         <xsd:simpleType name="telephoneNumber_Type">
> >                 <xsd:restriction base="xsd:string">
> >                         <xsd:pattern value="[-0-9+ ()]*"/>
> >                 </xsd:restriction>
> >         </xsd:simpleType>
> >
> > ... which shows no evidence of a backslash to protect the literal 
> > dash.
> >
> > These two parsers offer conflicting results given identical 
> input.   
> > While I'm agnostic as to which may be judged correct, they 
> should at 
> > least agree even if both are in error. :)
> >
> > I'm jointly addressing this to the W3C team and the folks over at 
> > DecisionSoft in hope this disparity may be resolved.
> >
> > Best regards,
> > -- Bud Hovell bud@syndafeed.com http://www.syndafeed.com <?xml 
> > version="1.0" encoding="utf-8"?>
> > <!-- edited with XMLSpy v2006 rel. 3 sp1 (http://www.altova.com) by 
> > Andy Dawkins (Primelocation) --> <xsd:schema 
> > xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> > 	<xsd:annotation>
> > 		<xsd:documentation xml:lang="en">
> >             PrimeLocation.com FastcropX1 data schema - Last Update
> > 2007-05-21
> >         </xsd:documentation>
> > 	</xsd:annotation>
> > 	<xsd:element name="root" type="root_Type"/>
> > 	<xsd:complexType name="root_Type">
> > 		<xsd:sequence>
> > 			<xsd:element name="agentGroup" 
> type="agentGroup_Type"  
> > minOccurs="0" maxOccurs="unbounded"/>
> > 		</xsd:sequence>
> > 	</xsd:complexType>
> > 	<xsd:complexType name="agentGroup_Type">
> > 		<xsd:sequence>
> > 			<xsd:element name="mode" 
> type="agentGroupMode_Type"  
> > default="FULL"/>
> > 			<xsd:element name="exportDate" 
> type="xsd:dateTime" minOccurs="0"/>
> > 			<!-- Not madatory but useful for debugging -->
> > 			<xsd:element name="agentBranch" 
> type="agentBranch_Type"  
> > minOccurs="0" maxOccurs="unbounded"/>
> > 		</xsd:sequence>
> > 		<xsd:attribute name="code" type="xsd:string" 
> use="required"/>
> > 	</xsd:complexType>
> > 	<xsd:complexType name="agentBranch_Type">
> > 		<xsd:sequence>
> > 			<xsd:element name="property" 
> type="property_Type" minOccurs="0"  
> > maxOccurs="unbounded"/>
> > 		</xsd:sequence>
> > 		<xsd:attribute name="code" type="xsd:string" 
> use="required"/>
> > 	</xsd:complexType>
> > 	<xsd:complexType name="property_Type">
> > 		<xsd:choice>
> > 			<xsd:sequence>
> > 				<!-- Property Address Details /-->
> > 				<xsd:element 
> name="fullPostCode" type="xsd:string"/>
> > 				<xsd:element name="countryCode" 
> type="countryCode_Type"  
> > default="GB" minOccurs="0"/>
> > 				<xsd:element name="name" 
> type="xsd:string"/>
> > 				<xsd:element name="address" 
> type="xsd:string"/>
> > 				<xsd:element name="regionCode" 
> type="xsd:string" minOccurs="0"/>
> > 				<!-- Property Description /-->
> > 				<xsd:element name="summary" 
> type="xsd:string" minOccurs="0"/>
> > 				<xsd:element name="details" 
> type="xsd:string" minOccurs="0"/>
> > 				<!-- Property Price Information /-->
> > 				<xsd:element name="pricePrefix" 
> type="pricePrefix_Type"/>
> > 				<xsd:element name="price" 
> type="integerRange_Type"/>
> > 				<xsd:element 
> name="priceCurrency" type="priceCurrency_Type"  
> > default="GBP" minOccurs="0"/>
> > 				<!-- Property sale specifics /-->
> > 				<xsd:element 
> name="sellingState" type="sellingState_Type"/>
> > 				<xsd:element 
> name="propertyType" type="propertyType_Type"/>
> > 				<xsd:element name="newHome" 
> type="xsd:string" minOccurs="0"/>
> > 				<xsd:element name="saleOrRent" 
> type="saleOrRent_Type"/>
> > 				<xsd:element 
> name="sharedCommission" type="xsd:string"  
> > minOccurs="0"/>
> > 				<!-- Rental Information /-->
> > 				<xsd:element name="groundRent" 
> type="xsd:decimal" minOccurs="0"/>
> > 				<!-- Value in GBP per annum /-->
> > 				<xsd:element 
> name="serviceCharge" type="xsd:decimal"  
> > minOccurs="0"/>
> > 				<!-- Value in GBP per annum /-->
> > 				<xsd:element name="furnished" 
> type="xsd:boolean" minOccurs="0"/>
> > 				<xsd:element 
> name="rentalLength" type="xsd:int" minOccurs="0"/>
> > 				<!-- Tenure Information /-->
> > 				<xsd:element name="tenure" 
> type="tenure_Type" default=""  
> > minOccurs="0"/>
> > 				<xsd:element 
> name="leaseholdYearsRemaining"  
> > type="integerOrNull_Type" minOccurs="0"/>
> > 				<!-- Property Room Information /-->
> > 				<xsd:element name="bedrooms" 
> type="integerRange_Type"/>
> > 				<xsd:element name="bathrooms" 
> type="integerRange_Type"/>
> > 				<xsd:element 
> name="receptionRooms" type="integerRange_Type"/>
> > 				<!-- Property Images, Supported 
> types: JPG, PNG, GIF /-->
> > 				<xsd:element name="mainImage" 
> type="asset_Type" minOccurs="0"/>
> > 				<!-- The file name of the image /-->
> > 				<xsd:element 
> name="additionalImage1" type="asset_Type"  
> > minOccurs="0"/>
> > 				<xsd:element 
> name="additionalImage2" type="asset_Type"  
> > minOccurs="0"/>
> > 				<xsd:element 
> name="additionalImage3" type="asset_Type"  
> > minOccurs="0"/>
> > 				<xsd:element 
> name="additionalImage4" type="asset_Type"  
> > minOccurs="0"/>
> > 				<!-- Floorplans, Up to four 
> images ( JPG, PNG, GIF ) OR a single 
> > PDF /-->
> > 				<xsd:element name="floorPlan1" 
> type="asset_Type" minOccurs="0"/>
> > 				<!-- The file name of the image /-->
> > 				<xsd:element name="floorPlan2" 
> type="asset_Type" minOccurs="0"/>
> > 				<xsd:element name="floorPlan3" 
> type="asset_Type" minOccurs="0"/>
> > 				<xsd:element name="floorPlan4" 
> type="asset_Type" minOccurs="0"/>
> > 				<!-- Brochure, A single PDF /-->
> > 				<xsd:element name="brochure" 
> type="asset_Type" minOccurs="0"/>
> > 				<!-- The file name of the pdf /-->
> > 				<!-- Virtual Tour -->
> > 				<xsd:element name="vTourURL" 
> type="xsd:string" minOccurs="0"/>
> > 				<!-- URL to a virtual Tour -->
> > 				<!-- Virtual Tour -->
> > 				<xsd:element name="vTour2URL" 
> type="xsd:string" minOccurs="0"/>
> > 				<!-- URL to a virtual Tour -->
> > 				<!-- HIP Document -->
> > 				<xsd:element name="HIPDocument" 
> type="asset_Type" minOccurs="0"/>
> > 				<!-- Filename or URL to an HIP 
> Document -->
> > 				<!-- EPC Document -->
> > 				<xsd:element name="EPCDocument" 
> type="asset_Type" minOccurs="0"/>
> > 				<!-- Filename or URL to an EPC 
> Document -->
> > 				<!-- Energy Efficiency Ratings -->
> > 				<xsd:element name="EERImage" 
> type="asset_Type" minOccurs="0"/>
> > 				<xsd:element name="EERCurrent" 
> type="xsd:integer" minOccurs="0"/>
> > 				<xsd:element 
> name="EERPotential" type="xsd:integer"  
> > minOccurs="0"/>
> > 				<!-- Environment Impact Ratings -->
> > 				<xsd:element name="EIRImage" 
> type="asset_Type" minOccurs="0"/>
> > 				<xsd:element name="EIRCurrent" 
> type="xsd:integer" minOccurs="0"/>
> > 				<xsd:element 
> name="EIRPotential" type="xsd:integer"  
> > minOccurs="0"/>
> > 				<!-- Optional Contact 
> Information. If provided will be used 
> > instead of contact information of the agent branch -->
> > 				<xsd:element name="contactName" 
> type="xsd:string" minOccurs="0"/>
> > 				<xsd:element name="contactTelephone"  
> > type="telephoneNumber_Type" minOccurs="0"/>
> > 				<xsd:element 
> name="contactEmail" type="xsd:string" minOccurs="0"/>
> > 				<!-- Additional Record Information /-->
> > 				<xsd:element name="createdDate" 
> type="xsd:dateTime"  
> > minOccurs="0"/>
> > 				<xsd:element 
> name="modifiedDate" type="xsd:dateTime"  
> > minOccurs="0"/>
> > 				<xsd:element 
> name="additionalKeywords" type="xsd:string"  
> > minOccurs="0"/>
> > 				<xsd:element name="notes" 
> type="xsd:string" minOccurs="0"/>
> > 			</xsd:sequence>
> > 			<xsd:sequence>
> > 				<xsd:element name="delete" 
> type="xsd:string" default="1"  
> > minOccurs="0"/>
> > 			</xsd:sequence>
> > 		</xsd:choice>
> > 		<xsd:attribute name="propertyID" 
> type="xsd:string" use="required"/>
> > 	</xsd:complexType>
> > 	<xsd:complexType name="asset_Type">
> > 		<xsd:simpleContent>
> > 			<xsd:extension base="xsd:string">
> > 				<xsd:attribute 
> name="modifiedDate" type="xsd:dateTime"  
> > use="optional"/>
> > 			</xsd:extension>
> > 		</xsd:simpleContent>
> > 	</xsd:complexType>
> > 	<!-- countryCode is always 2 alpha characters /-->
> > 	<xsd:simpleType name="countryCode_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:pattern value="[A-Za-z]{2}"/>
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- priceCurrency is always 3 alpha characters /-->
> > 	<xsd:simpleType name="priceCurrency_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:pattern value="[A-Za-z]{3}"/>
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- price,bedrooms,bathrooms, etc
> >          can be a string representation of an integer
> >          or an integer range of two integers seperated by ' 
> TO ' or ' 
> > - ' /-->
> > 	<xsd:simpleType name="integerRange_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:pattern value="([0-9]* ?(TO|-) 
> ?[0-9]*|[0-9]*)"/>
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<xsd:simpleType name="integerOrNull_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:pattern value="[-0-9]*"/>
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- Telephone can contains numbers, spaces, brackets, 
> +'s and -'s 
> > /-->
> > 	<xsd:simpleType name="telephoneNumber_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:pattern value="[-0-9+ ()]*"/>
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- agentGroupMode has a set list of possible values /-->
> > 	<xsd:simpleType name="agentGroupMode_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:enumeration value="FULL"/>
> > 			<xsd:enumeration value="INCR"/>
> > 			<!-- Full /-->
> > 			<!-- Incremental /-->
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- pricePrefix has a set list of possible values /-->
> > 	<xsd:simpleType name="pricePrefix_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:enumeration value="F"/>
> > 			<xsd:enumeration value="I"/>
> > 			<xsd:enumeration value="O"/>
> > 			<xsd:enumeration value="A"/>
> > 			<xsd:enumeration value="S"/>
> > 			<xsd:enumeration value="R"/>
> > 			<xsd:enumeration value="B"/>
> > 			<xsd:enumeration value="G"/>
> > 			<xsd:enumeration value="P"/>
> > 			<xsd:enumeration value="W"/>
> > 			<xsd:enumeration value="M"/>
> > 			<xsd:enumeration value="N"/>
> > 			<!-- Asking price of /-->
> > 			<!-- Offers in the region of /-->
> > 			<!-- Offers in excess of /-->
> > 			<!-- Auction guild price of /-->
> > 			<!-- Subject to contract /-->
> > 			<!-- Price range of /-->
> > 			<!-- Prices from /-->
> > 			<!-- Guide price /-->
> > 			<!-- Price on Application /-->
> > 			<!-- Weekly rental of /-->
> > 			<!-- Monthly rental of /-->
> > 			<!-- Annual rental of /-->
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- sellingState has a set list of possible values /-->
> > 	<xsd:simpleType name="sellingState_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:enumeration value="V"/>
> > 			<xsd:enumeration value="U"/>
> > 			<xsd:enumeration value="H"/>
> > 			<xsd:enumeration value="N"/>
> > 			<xsd:enumeration value="S"/>
> > 			<xsd:enumeration value="L"/>
> > 			<!-- Viewing /-->
> > 			<!-- Under offer /-->
> > 			<!-- Hidden /-->
> > 			<!-- New Instruction /-->
> > 			<!-- Sold /-->
> > 			<!-- Let /-->
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- propertyType has a set list of possible values /-->
> > 	<xsd:simpleType name="propertyType_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:enumeration value="H"/>
> > 			<xsd:enumeration value="F"/>
> > 			<xsd:enumeration value="A"/>
> > 			<!-- House /-->
> > 			<!-- Flat /-->
> > 			<!-- Agricultural  /-->
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- saleOrRent has a set list of possible values /-->
> > 	<xsd:simpleType name="saleOrRent_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:enumeration value="S"/>
> > 			<xsd:enumeration value="R"/>
> > 			<!-- Sale /-->
> > 			<!-- Rent /-->
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > 	<!-- tenure has a set list of possible values /-->
> > 	<xsd:simpleType name="tenure_Type">
> > 		<xsd:restriction base="xsd:string">
> > 			<xsd:enumeration value="F"/>
> > 			<xsd:enumeration value="S"/>
> > 			<xsd:enumeration value="L"/>
> > 			<xsd:enumeration value="X"/>
> > 			<xsd:enumeration value=""/>
> > 			<!-- Freehold /-->
> > 			<!-- Share of freehold /-->
> > 			<!-- Leasehold /-->
> > 			<!-- Not Specified /-->
> > 			<!-- Not Specified /-->
> > 		</xsd:restriction>
> > 	</xsd:simpleType>
> > </xsd:schema>
> 
> --
> CTO, DecisionSoft Limited
> +44 1865 203192 / +44 7968 408138
> 
> 
> 
Received on Thursday, 14 June 2007 15:53:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:59 GMT