W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > October to December 2010

[Bug 11125] Regex grammar for 1.1 renders some 1.0 regexes invalid

From: <bugzilla@jessica.w3.org>
Date: Fri, 22 Oct 2010 17:19:58 +0000
To: www-xml-schema-comments@w3.org
Message-Id: <E1P9LHa-00011A-1N@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11125

Michael Kay <mike@saxonica.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mike@saxonica.com

--- Comment #1 from Michael Kay <mike@saxonica.com> 2010-10-22 17:19:57 UTC ---
I asserted during the telcon that the rules for 1.0 second edition were
unclear, and Dave Petersen disputed this.

I thought there was an open bug on this, but I can't find it.

1.02e tries to solve the problem with the rules:

(A) The [, ], - and \ characters are not valid character ranges; 
(B) The ^ character is only valid at the beginning of a Ěpositive character
groupĚ if it is part of a Ěnegative character groupĚ 
(C) The - character is a valid character range only at the beginning or end of
a Ěpositive character groupĚ.

The problem is that rules (A) and (C) flatly contradict each other. If we
assume that (C) is meant to take priority, then [+-] and [-+] are both allowed,
but [a-z-+] is not. This is probably a reasonable way to define the rule. If
this is the rule that we want, then in 1.0 2e we should delete "-" from the
list of characters in rule (A), and in 1.1 we should change the paragraph that
follows production 81 from

<old>
If a charGroupPart starts with a singleChar and this is immediately followed by
a hyphen, and if the hyphen is part of the character group (that is, it is not
being treated as a subtraction operator because it is followed by '['), then
the hyphen must be followed by another singleChar, and the sequence
(singleChar, hyphen, singleChar) is treated as a charRange. It is an error if
either of the two singleChars in a charRange is a SingleCharNoEsc comprising an
unescaped hyphen.</old>

to

<new>
If a charGroupPart starts with a singleChar and this is immediately followed by
a hyphen, and if the hyphen is part of the character group (that is, it is not
being treated as a subtraction operator because it is followed by '['), then:
(a) the hyphen is treated as a singleChar if it is immediately followed by ']';
(b) in all other cases the hyphen must be followed by another singleChar, and
the sequence (singleChar, hyphen, singleChar) is treated as a charRange. It is
an error if either of the two singleChars in a charRange is a SingleCharNoEsc
comprising an unescaped hyphen.</new>

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 22 October 2010 17:20:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 October 2010 17:20:05 GMT