W3C home > Mailing lists > Public > xmlschema-dev@w3.org > January 2007

Re: regex help

From: Xan Gregg <xan.gregg@jmp.com>
Date: Wed, 3 Jan 2007 15:45:39 -0500
Message-Id: <F34E0C65-891D-4C78-A642-37CDC7DD6918@jmp.com>
Cc: "'Tsao, Scott'" <scott.tsao@boeing.com>, <xmlschema-dev@w3.org>
To: "Michael Kay" <mike@saxonica.com>

I think Michael's list option won't work because the quoted string  
items can contain whitespace.  The all-in-one pattern requires  
trailing whitespace, so I offer the following derivative which doesn't:

   <xs:pattern value='("[^"]*"(\s+"[^"]*")*)?'/>

The final '?' is to allow the empty list (no items). Remove it if  
that is not desired.

xan

On Jan 3, 2007, at 2:57 PM, Michael Kay wrote:

>
> Looks to me something like
>
> <xs:simpleType name="quotedString">
>   <xs:restriction base="xs:string">
>     <xs:pattern value='".*"'/>
>   </xs:restriction>
> </xs:simpleType>
>
> <xs:simpleType name="listOfQuotedStrings">
>   <xs:list itemType="quotedString"/>
> </xs:simpleType>
>
> or if you don't want to use a list type,
>
> <xs:simpleType name="listOfQuotedStrings">
>   <xs:restriction base="xs:string">
>     <xs:pattern value='(("[^"]*")\s+)*'/>
>   </xs:restriction></xs:simpleType>
> </xs:simpleType>
>
> ...
>
>> -----Original Message-----
>> From: xmlschema-dev-request@w3.org
>>
>> ...
>>
>> I'm trying to design a W3C XML Schema type description for an
>> element containing an arbitrary number of quoted strings
>> separated by arbitrary whitespace.  The contents of the
>> quoted items are themselves limited to alphanumerics,
>> whitespace, and common punctuation characters, excluding
>> embedded quote characters.  (The double quote here is chosen
>> as an arbitrary delimeter and has no special significance.)
>>
>> Example:
>> "abc" "de f" "123_456"
>> "foo bar" "etc."
>>
>> I'm not aware of a "built-in" XML Schema type that can
>> support this representation directly.  It also appears that
>> the W3C XML Schema "pattern"
>> facet (allowing the specification of a regular expression for a type
>> format) does not support the "non-greedy" quantifier syntax,
>> e.g., "*?", "+?" that is common in many regular expression engines.
>>
>> Can anyone suggest a regex to define this format without the
>> non-greedy quantifiers, or perhaps an XML Schema
>> representation that can handle this format directly?
>>
>
>
Received on Wednesday, 3 January 2007 20:48:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:57 GMT