Bug? Use of curly braces in Regular Expressions

Hello, reporting what looks like a bug in the XML Schema spec immediately
prior to
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#nt-Char -
see mails below.

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 22/08/2005 09:38 -----
                                                                           
             Sandy                                                         
             Gao/Toronto/IBM@I                                             
             BMCA                                                       To 
                                       Michael                             
             16/08/2005 20:50          Glavassevich/Toronto/IBM@IBMCA      
                                                                        cc 
                                       Alex Wood1/UK/IBM@IBMGB, Anna       
                                       Carrigan/Ireland/IBM@IBMIE, Anthony 
                                       O'Dowd/UK/IBM@IBMGB, David          
                                       Cargill/Toronto/IBM@IBMCA, John     
                                       Hibbert/UK/IBM@IBMGB, Sean          
                                       Dunne/Ireland/Contr/IBM@IBMIE,      
                                       Steve Hanson/UK/IBM@IBMGB           
                                                                   Subject 
                                       Re: Use of hyphen in Regular        
                                       Expressions(Document link: Steve    
                                       Hanson)                             
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           



Agree with Michael and Steve's analysis. Seems like a bug in the spec.

The right channel to get this fixed is to send an email to the
www-xml-schema-comments@w3.org mailing list. Steve, since this was
originated from your side, do you mind taking this action?

Thanks,
Sandy Gao
XML Parser Development, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com



                                                                           
             Michael                                                       
             Glavassevich/Toro                                             
             nto/IBM                                                    To 
                                       Steve Hanson/UK/IBM@IBMGB           
             08/16/2005 02:06                                           cc 
             PM                        Alex Wood1/UK/IBM@IBMGB, Anna       
                                       Carrigan/Ireland/IBM@IBMIE, Anthony 
                                       O'Dowd/UK/IBM@IBMGB, David          
                                       Cargill/Toronto/IBM@IBMCA, John     
                                       Hibbert/UK/IBM@IBMGB, Sean          
                                       Dunne/Ireland/Contr/IBM@IBMIE,      
                                       Sandy Gao/Toronto/IBM@IBMCA         
                                                                   Subject 
                                       Re: Use of hyphen in Regular        
                                       Expressions(Document link: Sandy    
                                       Gao)                                
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           



Hi Steve,

I think you've found an error in the schema spec.  If the set of normal
characters is defined as the set of characters which are not
metacharacters, production [10] should have been:

[10] Char      ::=      [^.\?*+{}()|#x5B#x5D]

instead of

[10] Char      ::=      [^.\?*+()|#x5B#x5D]

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
Phone: 905-413-2565 T/L 969-2565
E-mail: mrglavas@ca.ibm.com


                                                                           
             Steve                                                         
             Hanson/UK/IBM@IBM                                             
             GB                                                         To 
                                       David Cargill/Toronto/IBM@IBMCA     
             08/15/2005 08:18                                           cc 
             AM                        Alex Wood1/UK/IBM@IBMGB, Michael    
                                       Glavassevich/Toronto/IBM@IBMCA,     
                                       Sean Dunne/Ireland/Contr/IBM@IBMIE, 
                                       John Hibbert, Anna                  
                                       Carrigan/Ireland/IBM, Anthony       
                                       O'Dowd/UK/IBM                       
                                                                   Subject 
                                       Re: Use of hyphen in Regular        
                                       Expressions(Document link: Michael  
                                       Glavassevich)                       
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           



Hi Michael

Our testers have found one other issue with validation of regular
expressions. Comments?


2.) use of unescaped literal { or }  outside of a range [ ].
(6 of these in swift_field_defintitions)
   eg
   ({[^}]*})*
   fixed becomes
   (\{[^}]*\})*

   Schema spec actually contradicts itself on whether { or } must be
   escaped.


   [Definition:]   A metacharacter is either ., \, ?, *, +, {, } (, ), [ or
   ]. These characters have special meanings in ·regular expression·s, but
   can be escaped to form ·atom·s that denote the sets of strings
   containing only themselves, i.e., an escaped ·metacharacter· behaves
   like a ·normal character·.


   [Definition:]   A normal character is any XML character that is not a
   metacharacter. In ·regular expression·s, a normal character is an atom
   that denotes the singleton set of strings containing only itself.
   |-----------------------------------------------------------------------|
   |                                                                       |
   |  Normal Character                                                     |
   |                                                                       |
   |-----------------------------------------------------------------------|
   |                                                                       |
   |                                                                       |
   |                                                                       |
   |                                                                       |
   |  [10]                                                                 |
   |                                Char                                   |
   |                                                      ::=              |
   |                                                                    [^ |
   |                                                                    .\ |
   |                                                                    ?* |
   |                                                                    +( |
   |                                                                    )| |
   |                                                                    #x |
   |                                                                    5B |
   |                                                                    #x |
   |                                                                    5D |
   |                                                                    ]  |
   |                                                                       |
   |                                                                       |
   |-----------------------------------------------------------------------|



   So { and } are both a metacharacter and a normal character ???







Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

Received on Tuesday, 30 August 2005 20:21:29 UTC