- From: Michael Kay <mike@saxonica.com>
- Date: Thu, 14 Jun 2007 16:53:00 +0100
- To: "'Paul Warren'" <pdw@decisionsoft.com>, <bud@syndafeed.com>
- Cc: <xmlschema-dev@w3.org>, <tools@decisionsoft.com>
There's a long history here, only a small part of which is captured at:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1889
In fact the current spec makes three statements within a few lines of each
other, none of which agrees with the others, and there are no clues as to
which one takes precedence:
(1)
[17] charRange ::= seRange | XmlCharIncDash
which says that "-" is always a valid character-range
(2)
The [, ], - and \ characters are not valid character ranges;
which says that "-" can't be a character range
(3)
The - character is a valid character range only at the beginning or end of a
.positive character group..
which says it's sometimes valid and sometimes isn't (and says it in a very
odd way, because how do you know whether you're at the end of a positive
character group, especially where subtraction is involved?)
In my current implementation in Saxon I decided to allow "-" anywhere within
a character range, interpreting it as representing itself except in a
context where it can be interpreted as a range operator [a-z] or a
subtraction operator [\p{Lu}-[AEIOU]].
Users would be well-advised to steer clear of this and escape the "-"
everywhere.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: xmlschema-dev-request@w3.org
> [mailto:xmlschema-dev-request@w3.org] On Behalf Of Paul Warren
> Sent: 14 June 2007 15:43
> To: bud@syndafeed.com
> Cc: xmlschema-dev@w3.org; tools@decisionsoft.com
> Subject: Re: validator disparity
>
>
> Hi Bud,
>
> I should point out that our schema validation service is
> simply a web frontend to the Xerces-J Schema Validator (this
> used to be made clear on the web page, but it seems that
> notice has gone AWOL - I'll get that fixed).
>
> A quick look at the schema spec suggests that Xerces is wrong on this
> point:
>
> "The - character is a valid character range only at the
> beginning or end of a .positive character group.."
>
> regards,
>
> Paul
>
>
> On 14 Jun 2007, at 15:26, Bud Hovell wrote:
>
> > Hi, folks ...
> >
> > I've run across a bit of a puzzle and thought I'd at least
> report it
> > for examination by others technically qualified.
> >
> > In the course of validating a test output file found at http://
> > www.amexpat.com/primeloc.xml, I discovered both it and the schema
> > validate without complaint on the W2C validator at http://
> > www.w3.org/2001/03/webdata/xsv, but the schema does not validate at
> > http://tools.decisionsoft.com/schemaValidate/, which offers the
> > following complaint:
> > ================== OUTPUT ===================== XML Schema Validator
> >
> > Well Formed: VALID
> > Schema Validation: INVALID
> >
> > The following errors were found:
> > TYPELOCMESSAGE
> > Validation 128, 38InvalidRegex: Pattern value '[-0-9]*' is
> not a valid
> > regular expression. The reported error was: ''-' is an invalid
> > character range. Write '\-'.'.
> > Validation 134, 38InvalidRegex: Pattern value '[-0-9+ ()]*'
> is not a
> > valid regular expression. The reported error was: ''-' is
> an invalid
> > character range. Write '\-'.'.
> >
> > ================ END OUTPUT ===============
> >
> > ... evidently because the non-range-denoting "-" character
> is shown in
> > the first position of the pattern match in brackets rather
> than last.
> > I'm not acquainted with the specific rules for schema
> validation, but
> > seem to recall that most regex matching rules DO require a literal
> > naked dash to be mentioned last. In this case, the parser
> evidently
> > wants to see it backslashed so it is understood to denote a literal
> > rather than a range.
> >
> > This is the text of the two relevant blocks in the
> 2007-05-21 schema
> > file (attached in full) which I received from the provider and have
> > input for testing at DecisionSoft:
> >
> > <xsd:simpleType name="integerOrNull_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="[-0-9]*"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- Telephone can contains numbers, spaces,
> brackets, +'s and
> > -'s /-->
> > <xsd:simpleType name="telephoneNumber_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="[-0-9+ ()]*"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> >
> > ... which shows no evidence of a backslash to protect the literal
> > dash.
> >
> > These two parsers offer conflicting results given identical
> input.
> > While I'm agnostic as to which may be judged correct, they
> should at
> > least agree even if both are in error. :)
> >
> > I'm jointly addressing this to the W3C team and the folks over at
> > DecisionSoft in hope this disparity may be resolved.
> >
> > Best regards,
> > -- Bud Hovell bud@syndafeed.com http://www.syndafeed.com <?xml
> > version="1.0" encoding="utf-8"?>
> > <!-- edited with XMLSpy v2006 rel. 3 sp1 (http://www.altova.com) by
> > Andy Dawkins (Primelocation) --> <xsd:schema
> > xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> > <xsd:annotation>
> > <xsd:documentation xml:lang="en">
> > PrimeLocation.com FastcropX1 data schema - Last Update
> > 2007-05-21
> > </xsd:documentation>
> > </xsd:annotation>
> > <xsd:element name="root" type="root_Type"/>
> > <xsd:complexType name="root_Type">
> > <xsd:sequence>
> > <xsd:element name="agentGroup"
> type="agentGroup_Type"
> > minOccurs="0" maxOccurs="unbounded"/>
> > </xsd:sequence>
> > </xsd:complexType>
> > <xsd:complexType name="agentGroup_Type">
> > <xsd:sequence>
> > <xsd:element name="mode"
> type="agentGroupMode_Type"
> > default="FULL"/>
> > <xsd:element name="exportDate"
> type="xsd:dateTime" minOccurs="0"/>
> > <!-- Not madatory but useful for debugging -->
> > <xsd:element name="agentBranch"
> type="agentBranch_Type"
> > minOccurs="0" maxOccurs="unbounded"/>
> > </xsd:sequence>
> > <xsd:attribute name="code" type="xsd:string"
> use="required"/>
> > </xsd:complexType>
> > <xsd:complexType name="agentBranch_Type">
> > <xsd:sequence>
> > <xsd:element name="property"
> type="property_Type" minOccurs="0"
> > maxOccurs="unbounded"/>
> > </xsd:sequence>
> > <xsd:attribute name="code" type="xsd:string"
> use="required"/>
> > </xsd:complexType>
> > <xsd:complexType name="property_Type">
> > <xsd:choice>
> > <xsd:sequence>
> > <!-- Property Address Details /-->
> > <xsd:element
> name="fullPostCode" type="xsd:string"/>
> > <xsd:element name="countryCode"
> type="countryCode_Type"
> > default="GB" minOccurs="0"/>
> > <xsd:element name="name"
> type="xsd:string"/>
> > <xsd:element name="address"
> type="xsd:string"/>
> > <xsd:element name="regionCode"
> type="xsd:string" minOccurs="0"/>
> > <!-- Property Description /-->
> > <xsd:element name="summary"
> type="xsd:string" minOccurs="0"/>
> > <xsd:element name="details"
> type="xsd:string" minOccurs="0"/>
> > <!-- Property Price Information /-->
> > <xsd:element name="pricePrefix"
> type="pricePrefix_Type"/>
> > <xsd:element name="price"
> type="integerRange_Type"/>
> > <xsd:element
> name="priceCurrency" type="priceCurrency_Type"
> > default="GBP" minOccurs="0"/>
> > <!-- Property sale specifics /-->
> > <xsd:element
> name="sellingState" type="sellingState_Type"/>
> > <xsd:element
> name="propertyType" type="propertyType_Type"/>
> > <xsd:element name="newHome"
> type="xsd:string" minOccurs="0"/>
> > <xsd:element name="saleOrRent"
> type="saleOrRent_Type"/>
> > <xsd:element
> name="sharedCommission" type="xsd:string"
> > minOccurs="0"/>
> > <!-- Rental Information /-->
> > <xsd:element name="groundRent"
> type="xsd:decimal" minOccurs="0"/>
> > <!-- Value in GBP per annum /-->
> > <xsd:element
> name="serviceCharge" type="xsd:decimal"
> > minOccurs="0"/>
> > <!-- Value in GBP per annum /-->
> > <xsd:element name="furnished"
> type="xsd:boolean" minOccurs="0"/>
> > <xsd:element
> name="rentalLength" type="xsd:int" minOccurs="0"/>
> > <!-- Tenure Information /-->
> > <xsd:element name="tenure"
> type="tenure_Type" default=""
> > minOccurs="0"/>
> > <xsd:element
> name="leaseholdYearsRemaining"
> > type="integerOrNull_Type" minOccurs="0"/>
> > <!-- Property Room Information /-->
> > <xsd:element name="bedrooms"
> type="integerRange_Type"/>
> > <xsd:element name="bathrooms"
> type="integerRange_Type"/>
> > <xsd:element
> name="receptionRooms" type="integerRange_Type"/>
> > <!-- Property Images, Supported
> types: JPG, PNG, GIF /-->
> > <xsd:element name="mainImage"
> type="asset_Type" minOccurs="0"/>
> > <!-- The file name of the image /-->
> > <xsd:element
> name="additionalImage1" type="asset_Type"
> > minOccurs="0"/>
> > <xsd:element
> name="additionalImage2" type="asset_Type"
> > minOccurs="0"/>
> > <xsd:element
> name="additionalImage3" type="asset_Type"
> > minOccurs="0"/>
> > <xsd:element
> name="additionalImage4" type="asset_Type"
> > minOccurs="0"/>
> > <!-- Floorplans, Up to four
> images ( JPG, PNG, GIF ) OR a single
> > PDF /-->
> > <xsd:element name="floorPlan1"
> type="asset_Type" minOccurs="0"/>
> > <!-- The file name of the image /-->
> > <xsd:element name="floorPlan2"
> type="asset_Type" minOccurs="0"/>
> > <xsd:element name="floorPlan3"
> type="asset_Type" minOccurs="0"/>
> > <xsd:element name="floorPlan4"
> type="asset_Type" minOccurs="0"/>
> > <!-- Brochure, A single PDF /-->
> > <xsd:element name="brochure"
> type="asset_Type" minOccurs="0"/>
> > <!-- The file name of the pdf /-->
> > <!-- Virtual Tour -->
> > <xsd:element name="vTourURL"
> type="xsd:string" minOccurs="0"/>
> > <!-- URL to a virtual Tour -->
> > <!-- Virtual Tour -->
> > <xsd:element name="vTour2URL"
> type="xsd:string" minOccurs="0"/>
> > <!-- URL to a virtual Tour -->
> > <!-- HIP Document -->
> > <xsd:element name="HIPDocument"
> type="asset_Type" minOccurs="0"/>
> > <!-- Filename or URL to an HIP
> Document -->
> > <!-- EPC Document -->
> > <xsd:element name="EPCDocument"
> type="asset_Type" minOccurs="0"/>
> > <!-- Filename or URL to an EPC
> Document -->
> > <!-- Energy Efficiency Ratings -->
> > <xsd:element name="EERImage"
> type="asset_Type" minOccurs="0"/>
> > <xsd:element name="EERCurrent"
> type="xsd:integer" minOccurs="0"/>
> > <xsd:element
> name="EERPotential" type="xsd:integer"
> > minOccurs="0"/>
> > <!-- Environment Impact Ratings -->
> > <xsd:element name="EIRImage"
> type="asset_Type" minOccurs="0"/>
> > <xsd:element name="EIRCurrent"
> type="xsd:integer" minOccurs="0"/>
> > <xsd:element
> name="EIRPotential" type="xsd:integer"
> > minOccurs="0"/>
> > <!-- Optional Contact
> Information. If provided will be used
> > instead of contact information of the agent branch -->
> > <xsd:element name="contactName"
> type="xsd:string" minOccurs="0"/>
> > <xsd:element name="contactTelephone"
> > type="telephoneNumber_Type" minOccurs="0"/>
> > <xsd:element
> name="contactEmail" type="xsd:string" minOccurs="0"/>
> > <!-- Additional Record Information /-->
> > <xsd:element name="createdDate"
> type="xsd:dateTime"
> > minOccurs="0"/>
> > <xsd:element
> name="modifiedDate" type="xsd:dateTime"
> > minOccurs="0"/>
> > <xsd:element
> name="additionalKeywords" type="xsd:string"
> > minOccurs="0"/>
> > <xsd:element name="notes"
> type="xsd:string" minOccurs="0"/>
> > </xsd:sequence>
> > <xsd:sequence>
> > <xsd:element name="delete"
> type="xsd:string" default="1"
> > minOccurs="0"/>
> > </xsd:sequence>
> > </xsd:choice>
> > <xsd:attribute name="propertyID"
> type="xsd:string" use="required"/>
> > </xsd:complexType>
> > <xsd:complexType name="asset_Type">
> > <xsd:simpleContent>
> > <xsd:extension base="xsd:string">
> > <xsd:attribute
> name="modifiedDate" type="xsd:dateTime"
> > use="optional"/>
> > </xsd:extension>
> > </xsd:simpleContent>
> > </xsd:complexType>
> > <!-- countryCode is always 2 alpha characters /-->
> > <xsd:simpleType name="countryCode_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="[A-Za-z]{2}"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- priceCurrency is always 3 alpha characters /-->
> > <xsd:simpleType name="priceCurrency_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="[A-Za-z]{3}"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- price,bedrooms,bathrooms, etc
> > can be a string representation of an integer
> > or an integer range of two integers seperated by '
> TO ' or '
> > - ' /-->
> > <xsd:simpleType name="integerRange_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="([0-9]* ?(TO|-)
> ?[0-9]*|[0-9]*)"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> > <xsd:simpleType name="integerOrNull_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="[-0-9]*"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- Telephone can contains numbers, spaces, brackets,
> +'s and -'s
> > /-->
> > <xsd:simpleType name="telephoneNumber_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:pattern value="[-0-9+ ()]*"/>
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- agentGroupMode has a set list of possible values /-->
> > <xsd:simpleType name="agentGroupMode_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:enumeration value="FULL"/>
> > <xsd:enumeration value="INCR"/>
> > <!-- Full /-->
> > <!-- Incremental /-->
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- pricePrefix has a set list of possible values /-->
> > <xsd:simpleType name="pricePrefix_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:enumeration value="F"/>
> > <xsd:enumeration value="I"/>
> > <xsd:enumeration value="O"/>
> > <xsd:enumeration value="A"/>
> > <xsd:enumeration value="S"/>
> > <xsd:enumeration value="R"/>
> > <xsd:enumeration value="B"/>
> > <xsd:enumeration value="G"/>
> > <xsd:enumeration value="P"/>
> > <xsd:enumeration value="W"/>
> > <xsd:enumeration value="M"/>
> > <xsd:enumeration value="N"/>
> > <!-- Asking price of /-->
> > <!-- Offers in the region of /-->
> > <!-- Offers in excess of /-->
> > <!-- Auction guild price of /-->
> > <!-- Subject to contract /-->
> > <!-- Price range of /-->
> > <!-- Prices from /-->
> > <!-- Guide price /-->
> > <!-- Price on Application /-->
> > <!-- Weekly rental of /-->
> > <!-- Monthly rental of /-->
> > <!-- Annual rental of /-->
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- sellingState has a set list of possible values /-->
> > <xsd:simpleType name="sellingState_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:enumeration value="V"/>
> > <xsd:enumeration value="U"/>
> > <xsd:enumeration value="H"/>
> > <xsd:enumeration value="N"/>
> > <xsd:enumeration value="S"/>
> > <xsd:enumeration value="L"/>
> > <!-- Viewing /-->
> > <!-- Under offer /-->
> > <!-- Hidden /-->
> > <!-- New Instruction /-->
> > <!-- Sold /-->
> > <!-- Let /-->
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- propertyType has a set list of possible values /-->
> > <xsd:simpleType name="propertyType_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:enumeration value="H"/>
> > <xsd:enumeration value="F"/>
> > <xsd:enumeration value="A"/>
> > <!-- House /-->
> > <!-- Flat /-->
> > <!-- Agricultural /-->
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- saleOrRent has a set list of possible values /-->
> > <xsd:simpleType name="saleOrRent_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:enumeration value="S"/>
> > <xsd:enumeration value="R"/>
> > <!-- Sale /-->
> > <!-- Rent /-->
> > </xsd:restriction>
> > </xsd:simpleType>
> > <!-- tenure has a set list of possible values /-->
> > <xsd:simpleType name="tenure_Type">
> > <xsd:restriction base="xsd:string">
> > <xsd:enumeration value="F"/>
> > <xsd:enumeration value="S"/>
> > <xsd:enumeration value="L"/>
> > <xsd:enumeration value="X"/>
> > <xsd:enumeration value=""/>
> > <!-- Freehold /-->
> > <!-- Share of freehold /-->
> > <!-- Leasehold /-->
> > <!-- Not Specified /-->
> > <!-- Not Specified /-->
> > </xsd:restriction>
> > </xsd:simpleType>
> > </xsd:schema>
>
> --
> CTO, DecisionSoft Limited
> +44 1865 203192 / +44 7968 408138
>
>
>
Received on Thursday, 14 June 2007 15:53:32 UTC