- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 14 Sep 2005 19:18:02 +0000
- To: www-xml-schema-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=2216
Summary: R-224: Questions about metacharacters in regular
expressions
Product: XML Schema
Version: 1.0
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: XSD Part 2: Datatypes
AssignedTo: cmsmcq@w3.org
ReportedBy: sandygao@ca.ibm.com
QAContact: www-xml-schema-comments@w3.org
Appendix F in the Part 2 of XML Schema 1.0 defines 'metacharacter' thus:
A metacharacter is either ., \, ?, *, +, {, } (, ), [ or ].
It defines 'normal character' thus:
[Definition:] A normal character is any XML character that is not a
metacharacter. In regular expressions, a normal character is an atom that
denotes the singleton set of strings containing only itself.
Production [10], which I take to be defining normal characters, reads:
Normal Character [10] Char ::= [^.\?*+()|#x5B#x5D]
The metacharacters all need escapes, so production 24 is also relevant here:
Single Character Escape [24] SingleCharEsc ::= '\' [nrt\|.?*+(){}
#x2D#x5B#x5D#x5E]
I have some questions:
1. shouldn't { and } (braces) be included in production [10]? ? [10] Char ::=
[^.\?*+{}()|#x5B#x5D]
2. shouldn't | (vertical bar) be among the characters defined as
metacharacters?
3. should ^ (#x5E) be included among the metacharacters?
4. would it be possible to list the magic characters in the same order in 10
and 24, to make eyeball-based comparisons easier?
I suspect the answer to (2) is 'yes' and the answer to (3) is 'no, on the
theory that the term 'metacharacter' is best reserved for characters which have
special meaning at the top level of a regular expression and which must
therefore have escapes to avoid ambiguity. Hyphen, circumflex, comma, n, r, and
t all have special meaning only in special contexts (within character groups,
within quantity-range specifications, or after backslash), and so aren't
metacharacters in this sense.
See:
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2003JulSep/0009.html
Received on Wednesday, 14 September 2005 19:18:16 UTC