[Bug 1922] New: 'x' regex flag not entirely clear

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1922

           Summary: 'x' regex flag not entirely clear
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Platform: PC
        OS/Version: Windows 2000
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators
        AssignedTo: ashok.malhotra@oracle.com
        ReportedBy: holstege@mathling.com
         QAContact: public-qt-comments@w3.org


Section 7.6.1.1 of F&O says only this about the 'x' flag:
"x: If present, whitespace characters within the regular expression are ignored. 
By default, whitespace characters match themselves. This allows, for example, 
regular expressions to be broken up into lines for readability."

Our implementors ask for clarification of what 'ignored' means. Here are some
cases:

fn:matches("helloworld", "hello[ ]world", "x")
   Error? (because [] is not a valid character set?) Or true()?
fn:matches("hello world", "hello\ sworld", "x")
   True or false? That is is '\ s' == '\s'?
And so forth for spaces in other odd places:
"(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)\1 0" 
     \1 followed by '0' or \10?
"\p{ Lu}" 
"\p{L u}" 
"[a- ]"
"[a- z]" 
"hello\ "
"[ ^a]"
"[^ ]"

We assume the appropriate semantic is to pre-strip all whitespace and then parse
the resulting regex; this is certainly simpler from an implementation 
standpoint, but "ignore" isn't entirely clear and could me to ignore in 
matching, not parsing.

Received on Wednesday, 31 August 2005 15:26:29 UTC