[Bug 4543] [F+O] Multiline mode in regular expressions

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4543

           Summary: [F+O] Multiline mode in regular expressions
           Product: XPath / XQuery / XSLT
           Version: Recommendation
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators
        AssignedTo: ashok.malhotra@oracle.com
        ReportedBy: mike@saxonica.com
         QAContact: public-qt-comments@w3.org


I previously raised this issue by email (member-only):

http://lists.w3.org/Archives/Member/w3c-xsl-query/2006Aug/0020.html
http://lists.w3.org/Archives/Member/w3c-xsl-query/2006Aug/0021.html

and the minutes of meeting 308 record the following:

http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2006Sep/0027.html

J5.1 Multiline mode in regular expressions (Michael Kay)
http://lists.w3.org/Archives/Member/w3c-xsl-query/2006Aug/0020.html
see also:
http://lists.w3.org/Archives/Member/w3c-xsl-query/2006Aug/0021.html 

*Michael will open a bugzilla report and resolve it with the suggested changes.

It appears however that this was not recorded as an official action and was
therefore not pursued. The problem remains in the spec. Specifically:

Consider the expression

matches($in, '^.*$', 'm')

Intuitively, this returns true if $in contains a zero-length line. I believe
that on both Java and .NET, there is not considered to be a zero-length line
between a final newline character and the end of the string, so if $in is:

"abcd
defg
"

the expression will return false. But with our spec, I think there is a
zero-length line after the final newline, so this expression will return
true.

I think that in 7.6.1.1 Flags, under the description of flag "m", we should
change 

^ matches the start of any line (that is, the start of the entire string,
and the position immediately after a newline character)

to

^ matches the start of any line (that is, the start of the entire string,
and the position immediately after a newline character other than a newline
that appears as the last character in the string)

and change:

"while $ matches the end of any line (that is, the end of the entire string,
and the position immediately before a newline character)" 

to

while $ matches the end of any line (that is, the position immediately
before a newline character, and the end of the entire string if there is no
newline character at the end of the string)

Received on Sunday, 6 May 2007 19:59:58 UTC