W3C home > Mailing lists > Public > xmlschema-dev@w3.org > January 2007

Re: regex help

From: Pete Cordell <petexmldev@tech-know-ware.com>
Date: Thu, 4 Jan 2007 09:35:03 -0000
Message-ID: <004c01c72fe4$4aca7660$1c00a8c0@Codalogic>
To: "Tsao, Scott" <scott.tsao@boeing.com>, <xmlschema-dev@w3.org>

Just a note on the greediness mentioned here...

My understanding is that the greediness of a regular expression is only an 
issue when you are capturing the values of sub-patterns within the target 
data.  (e.g. in Perl 'ML123' =~ /ML(\d+)/; captures 123 into $1.)  When only 
doing matching, eventually, if possible, the pattern will be matched 
irrespective of whether greedy or non-greedy matching is used.

Greediness just affects whether the regular expression engine attempts to 
grab lots of content for a sub-expression in it's first attempt and them 
back track, or attempts to capture the minimal amount in its first attempt 
and then forward track (not sure if that's a proper term!).

If anyone's opinion differs, please let me know.

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx
(or http://www.xml2cpp.com)
=============================================

Original Message From: "Tsao, Scott"

A colleague raised this question below regarding the use of XSD pattern
facet.

Can someone help please?

Thanks,

Scott Tsao
Enterprise Architecture and Integration
The Boeing Company


-----Original Message-----

I'm trying to design a W3C XML Schema type description for an element
containing an arbitrary number of quoted strings separated by arbitrary
whitespace.  The contents of the quoted items are themselves limited to
alphanumerics, whitespace, and common punctuation characters, excluding
embedded quote characters.  (The double quote here is chosen as an
arbitrary delimeter and has no special significance.)

Example:
"abc" "de f" "123_456"
"foo bar" "etc."

I'm not aware of a "built-in" XML Schema type that can support this
representation directly.  It also appears that the W3C XML Schema
"pattern"
facet (allowing the specification of a regular expression for a type
format) does not support the "non-greedy" quantifier syntax, e.g., "*?",
"+?" that is common in many regular expression engines.

Can anyone suggest a regex to define this format without the non-greedy
quantifiers, or perhaps an XML Schema representation that can handle
this format directly?
Received on Thursday, 4 January 2007 09:40:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:57 GMT