Re: WG: RegEx [+\-] or [+-]?

At 10:57 AM +0000 2007-06-05, Rao,Dr.,R.,SNL IT Filialen,4110,DA wrote:

>The regEx below make restrictions on the UTC-Offset in the datetime data type.
>
>We found out that the expression 1 makes no 
>problem with the Microsoft parser (.net 2.0).
>
>A third party which receives data from us, uses 
>Java and have problems with 1. They would like 
>the expression 2.

>1: <xs:pattern 
>value="\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d([+-](0[0-9]|1[0-2]):00|[+-](03|09):30|[+](13|14):00|[+](04|05|06|10|11):30|[-]08:30)"/>
>
>2: <xs:pattern 
>value="\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d([+\-](0[0-9]|1[0-2]):00|[+\-](03|09):30|[+](13|14):00|[+](04|05|06|10|11):30|[\-]08:30)"/>

Well, I believe there is no reason in that particular case to escape the
'-'.

On the other hand, the plus sign needs to be escaped!

'+' is defined as a metacharacter, and once a metacharacter always a
metacharacter.  Here's the appropriate definitions from the spec:

>[Definition:]   A metacharacter is either ., \, 
>?, *, +, {, } (, ), |, [, or ]. These characters 
>have special meanings in regular expressions, 
>but can be escaped to form atoms that denote the 
>sets of strings containing only themselves, 
>i.e., an escaped ·metacharacter· behaves like a 
>normal character.
>
>[Definition:]   A normal character is any XML 
>character that is not a metacharacter.  In 
>regular expressions, a normal character is an 
>atom that denotes the singleton set of strings 
>containing only itself.

The spec editors are aware of these RE problems and I hope they will be fixed
before the next public draft.  Meanwhile, I hope this helps.
-- 
Dave Peterson
SGMLWorks!

davep@iit.edu

Received on Tuesday, 12 June 2007 01:59:16 UTC