[Bug 20575] New: [QT3TS] test-case re00216 in test-set fn-matches.re

https://www.w3.org/Bugs/Public/show_bug.cgi?id=20575

            Bug ID: 20575
           Summary: [QT3TS] test-case re00216 in test-set fn-matches.re
    Classification: Unclassified
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XQuery 3 & XPath 3 Test Suite
          Assignee: oneil@saxonica.com
          Reporter: mike@saxonica.com
        QA Contact: public-qt-comments@w3.org

This test does

matches('qwerty','\p{IsaA0-a9}')

and expects an error on the grounds that the regular expression is invalid,
since "IsA0-a9" is not a recongnized group name.

The specification states that "if the value of $pattern is invalid according to
the rules described in 5.6.1 Regular expression syntax", and section 5.6.1 says
"The regular expression syntax and semantics are identical to those defined in
[XML Schema Part 2: Datatypes Second Edition] with the following [irrelevant]
additions..." The reference is to XSD 1.0, but we state "Implementations of
this specification may support either XSD 1.0 or XSD 1.1 or both.".

The relevant syntax rule in both XSD 1.0 and XSD 1.1 is:

    IsBlock       ::=       'Is' [a-zA-Z0-9#x2D]+

Thus this regular expression matches the syntax.

In XSD 1.0, no semantics are given for a regular expression that uses an
unknown block name, but it is nowhere stated that this is an error.

The situation is clarified in XSD 1.1:

<quote>
If a string "IsX" matches the non-terminal IsBlock but X is not a recognized
block name, then the expressions "\p{IsX}" and "\P{IsX}" each denote the set of
all characters. Processors may ·at user option· treat both "\p{IsX}" and
"\P{IsX}" as denoting the empty set, instead of the set of all characters....

Processors should issue a warning if they encounter a regular expression using
a block name they do not recognize. Processors may ·at user option· treat
unrecognized block names as ·errors· in the schema.

Note: Treating unrecognized block names as errors increases the likelihood that
errors in spelling the block name will be detected and can be helpful in
checking the correctness of schema documents. However, it also decreases the
portability of schema documents among processors supporting different versions
of [Unicode Database]; it is for this reason that processors are allowed to
treat unrecognized block names as errors only when the user has explicitly
requested this behavior.
</quote>

We clearly have the opportunity to say something different for XPath regular
expressions, but currently we do not do so. I think a clarification in the spec
would be appropriate. In the meantime, based on the XSD 1.1 rules which we
inherit, I propose to allow the alternative result "false".

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Saturday, 5 January 2013 22:55:38 UTC