W3C home > Mailing lists > Public > xmlschema-dev@w3.org > January 2007

RE: regex help

From: Watkins, Bill <bill.watkins@boeing.com>
Date: Thu, 4 Jan 2007 17:00:37 -0600
Message-ID: <8228AE6F0FF69D4DB1D4A50C72A7CE6804E9E1C8@XCH-SE-2V2.se.nos.boeing.com>
To: <xmlschema-dev@w3.org>
Cc: "Tsao, Scott" <scott.tsao@boeing.com>

Thanks to Michael Kay and Xan Gregg for proposing solutions to the
problem of not being able to use the non-greedy regex quantifiers in the
pattern facet when trying to validate something like a list of delimited
strings (in this case, the delimiter is a double-quote).  Unfortunately,
I'm still unable to validate an element that contains a list of quoted
strings with the suggested patterns.  Below are my results using the
various suggestions.

I'm using the XML-Buddy 2.0.9 plug-in to Eclipse 3.2.1 for validation.
Interestingly, some of the pattern regex's will pass the strings so long
as they don't contain an embedded whitespace character.  I've included
test results using both strings.

The "free" version of XML-Buddy only seems to provide mouse-over
validation error pop-ups.  Note that validation engine sometimes
returned a different error message in the pop-up in some cases on
successive tries.  In these cases, one of two errors was typically
returned in the mouse-over message, either one pointing at the pattern
facet for the element type, or one pointing at the element.  I don't
know why this happens, though it might be something like a hash table
query resulting in different retrieval orders on the mouse-over.

Sample test file:


<?xml version="1.0"?>
<TestSmall xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="TestSmallQuotedListSchema.xml">
	
	<QuotedStringList>
		"your dog.my_dog" "my_cat$your_cat"
		"their_rat"
	</QuotedStringList>
	
</TestSmall>

Sample test file:


<?xml version="1.0"?>
<TestSmall xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="TestSmallQuotedListSchema.xml">
	
	<QuotedStringList>
		"your dog.my_dog" "my_cat$your_cat"
		"their_rat"
	</QuotedStringList>
	
</TestSmall>


Original schema candidate (i.e., mine):



<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

	<xs:simpleType name="QUOTEDNAME_TYPE">
		<xs:restriction base="xs:string">
			<xs:pattern value='\s*"\S([^"]|\s)+\S"\s*'/>
		</xs:restriction>
	</xs:simpleType>
	
	<xs:simpleType name="QUOTEDNAMELIST_TYPE">
		<xs:list itemType="QUOTEDNAME_TYPE"/>			
	</xs:simpleType>
	
	<xs:element name="QuotedStringList" type="QUOTEDNAMELIST_TYPE"/>
	
	<xs:element name="TestSmall">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="QuotedStringList"
minOccurs="1" maxOccurs ="1"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

</xs:schema>



Validation Result w/o underscore ("your dog"):
"Error -cvc-pattern-valid: Value '"your' is not facet-valid with respect
to pattern '\s*"\S([^"]|[\s)+|S"\s*'  for type 'QUOTEDNAME_TYPE'."
OR
"Error -cvc-type.3.1.3: The value "your dog.my_dog" "my_cat$your_cat"
"their_rat" of element 'QuotedStringList'  is not valid.


Validation Result w/ underscore ("your_dog"):
OK

First suggested candidate from Kay:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

	<xs:simpleType name="quotedString">
	  <xs:restriction base="xs:string">
		<xs:pattern value='".*"'/>
	  </xs:restriction>
	</xs:simpleType>
	
	<xs:simpleType name="listOfQuotedStrings">
	  <xs:list itemType="quotedString"/>
	</xs:simpleType>
	
	<xs:element name="QuotedStringList" type="listOfQuotedStrings"/>
	
	<xs:element name="TestSmall">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="QuotedStringList"
minOccurs="1" maxOccurs ="1"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

</xs:schema>



Validation Result w/o underscore ("your dog"):
"Error - cvc-pattern-valid: Value '"your' is not facet-valid with
respect to pattern '".*" for type  'quotedString'.
(Note: it is not entirely clear on my monitor in this case whether the
characters in the  message that look like: '" is actually a single
double-quote with an artifact that appears to be a third  stroke.)
OR
"Error -cvc-type.3.1.3: The value "your dog.my_dog" "my_cat$your_cat"
"their_rat" of element 'QuotedStringList'  is not valid."

Validation Result w/ underscore ("your_dog"):
OK


Alternate suggested candidate from Kay:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

	<xs:simpleType name="listOfQuotedStrings">
	  <xs:restriction base="xs:string">
		<xs:pattern value='(("[^"]*")\s+)*'/>
	  </xs:restriction>
	</xs:simpleType>
	
	<xs:element name="QuotedStringList" type="listOfQuotedStrings"/>
	
	<xs:element name="TestSmall">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="QuotedStringList"
minOccurs="1" maxOccurs ="1"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

</xs:schema>


Validation Result w/o underscore ("your dog"):
"Error - cvc-pattern-valid: Value '"your dog.my_dog" "my_cat$your_cat"
"their_rat" is not facet-valid with  respect to pattern
'(("[^"]*")\s+)*' for type 'listOfQuotedStrings'."
OR
"Error - cvc-type.3.1.3: The value '"your dog.my_dog" "my_cat$your_cat"
"their_rat" of element  'QuotedStringList' is not valid."

Validation Result w/ underscore ("your_dog"):
"Error - cvc-pattern-valid: Value '"your_dog.my_dog" "my_cat$your_cat"
"their_rat" is not facet-valid with  respect to pattern
'(("[^"]*")\s+)*' for type 'listOfQuotedStrings'."
OR
"Error - cvc-type.3.1.3: The value '"your_dog.my_dog" "my_cat$your_cat"
"their_rat" of element  'QuotedStringList' is not valid."


Suggested revision of Kay's alternate candidate by Xan Gregg:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

	<xs:simpleType name="listOfQuotedStrings">
	  <xs:restriction base="xs:string">
		<xs:pattern value='("[^"]*"(\s+"[^"]*")*)?'/>
	  </xs:restriction>
	</xs:simpleType>
	
	<xs:element name="QuotedStringList" type="listOfQuotedStrings"/>
	
	<xs:element name="TestSmall">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="QuotedStringList"
minOccurs="1" maxOccurs ="1"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

</xs:schema>


Validation Result w/o underscore ("your dog"):
"Error - cvc-type.3.1.3: The value '"your dog.my_dog" "my_cat$your_cat"
"their_rat" of element  'QuotedStringList' is not valid.
OR
"Error - cvc-pattern-valid: Value '"your dog.my_dog" "my_cat$"
"their_rat"' is not facet-valid with respect to  pattern
'("[^"]*"(\s+"[^"]*")*)?' for type 'listOfQuotedStrings'.

Validation Result w/ underscore ("your_dog"):
"Error - cvc-pattern-valid: Value '"your_dog.my_dog" "my_cat$your_cat"
"their_rat"' is not facet-valid with  respect to pattern
'(*[^"]*"(\s+"[^"]*")*)?' for type 'listOfQuotedStrings'."
OR 
"Error - cvc-type.3.1.3: The value '"your_dog.my_dog" "my_cat$your_cat"
"their_rat" of element  'QuotedStringList' is not valid.



If anyone has any suggestions, I'd appreciate them.

Thanks,
Bill

Bill Watkins
Software Engineer
Boeing Satellite Operations & Ground Systems - Houston

"Any opinions expressed are my own and do not reflect those of Boeing."
"For every complex problem, there is a solution that is simple, neat,
and wrong."  HL Mencken
Received on Friday, 5 January 2007 00:39:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:57 GMT