W3C home > Mailing lists > Public > xmlschema-dev@w3.org > September 2005

Re: FW: Regex syntax [+-]

From: Pete Cordell <petexmldev@tech-know-ware.com>
Date: Fri, 23 Sep 2005 08:53:54 +0100
Message-ID: <005e01c5c014$03823130$a300a8c0@RW>
To: "Michael Kay" <mike@saxonica.com>, "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: <xmlschema-dev@w3.org>

I just wondered whether there had been a resolution to the Regex question?

(I'm afraid I don't know how long these things need to get through the WG.
I do know that my ageing brain will forget it was mentioned unless it's
brought up again!  I'm keen to know if the answer is what I want to hear!)

Original Message From: "Henry S. Thompson"
---------
>Michael Kay" writes:

>> A couple of weeks ago I raised this message on the list, and received no
>> reply.
>>
>> Does this mean:
>
>> (a) that it will eventually be answered, and in the meantime I can enjoy
>> listening to piped Vivaldi, or

>:-)

> For my part it means I've been on holiday and then at a Schema WG f2f
> - -- I'll try to get this to the WG's attention RSN.


-----------Original mail in case it helps-----------------

I'm busy trying to implement the anti-erratum that says [+-] in a regex is
now legal, and I'm therefore trying to understand exactly what the rules now
are.

In particular, what characters are allowed to appear as s and e in a range
[s-e]?

The production rules say

[18]   seRange    ::=   charOrEsc '-' charOrEsc
[20]   charOrEsc    ::=   XmlChar | SingleCharEsc
[21]   XmlChar    ::=   [^\#x2D#x5B#x5D]

which imply that [, ], \, and - are disallowed in both positions.

But the text then elaborates this by saying that

s-e is a valid character range iff:

    * s is a .single character escape., or an XML character;
    * s is not \
    * If s is the first character in a .character class expression., then s
is not ^
    * e is a .single character escape., or an XML character;
    * e is not \ or [; and
    * The code point of e is greater than or equal to the code point of s;

Question: in this English text, what does "XML character" mean? Does it mean
any character allowed in XML, or does it mean XmlChar as defined in
production 21? (If it means XMLChar, why are bullets 2 and 5 there?)

The grammar rules say that \ and [ are disallowed in both positions, but the
English rules say \ is disallowed for the start of the range while both \
and [ are disallowed for the end. Why the inconsistency? Why is "-" not
mentioned?

I'm left more confused than ever!

Michael Kay
http://www.saxonica.com/

------------------End of Original mail--------------

Thanks,

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
                         for XML to C++ data binding visit
                         http://www.tech-know-ware.com/lmx
                         (or http://www.xml2cpp.com)
=============================================
Received on Friday, 23 September 2005 07:54:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:51 GMT