[Bug 29416] New: [QT3] re00054, a test with character class expression [^-z], should throw FORX0002

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29416

            Bug ID: 29416
           Summary: [QT3] re00054, a test with character class expression
                    [^-z], should throw FORX0002
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XQuery 3 & XPath 3 Test Suite
          Assignee: oneil@saxonica.com
          Reporter: abel.braaksma@xs4all.nl
        QA Contact: public-qt-comments@w3.org
  Target Milestone: ---

The whole expression in this test is "^(?:[^-z]+)$" (without quotes).

I am reporting this because 
a) either the rules are not clear or ambiguous and the test is correct in one
reading of the spec
b) the test is not correct

In XSD 1.0, the production rules of [17] charRange apply. In the accompanying
text, the author states that the rules are ambiguous and then goes on that they
are not:

1) The [, ], - and \ characters are not valid character ranges; 
A: this does not apply

2) The ^ character is only valid at the beginning of a ·positive character
group· if it is part of a ·negative character group·
A: this applies, and gets the meaning of negating the character group

3) The - character is a valid character range only at the beginning or end of a
·positive character group·. 
A: ambiguous in this case, as the production rules do not allow this here.

[14]: posCharGroup ::= ( charRange | charClassEsc )+ 
[17]: charRange    ::= seRange | XmlCharIncDash 
[18]: seRange      ::= charOrEsc '-' charOrEsc
[20]: charOrEsc    ::= XmlChar | SingleCharEsc
[21]: XmlChar      ::= [^\#x2D#x5B#x5D]
[22]: XmlCharIncDash ::= [^\#x5B#x5D]

Following this production rules, in part, we get:
4) it is a posCharGroup
5) it is a charRange
6) that range is "^" to "z"

Now back at rule (2) above. The "^" is only valid in this position if it is
also part of a negative character group. 

All in all, I think if the intended meaning was "from '^' to 'z'" then it
should have been written as [\^-z], if it was "not from  '^' to 'z'" then it
should have been written as [^^-z]. 

If the intention was "from ^ to z" then it should have been written as [\^-z]
If the intention was "not from ^ to z" then [^^-z] appears to be allowed
(though [^\^-z] makes more sense to me)
If the intention was "either ^, - or z", then [\^\-z]
If the intention was "not - or z", then [^\-z]

I think that the expression as written does not fit the production rules or
description and should raise FORX0002.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Tuesday, 2 February 2016 03:09:49 UTC