- From: Ashok Malhotra <ashokma@microsoft.com>
- Date: Thu, 10 Jul 2003 14:33:35 -0700
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "W3C XML Schema Comments list" <www-xml-schema-comments@w3.org>
Michael: How did you run across this? I'm just curious because I sent at least one (possibly two) note about it and there was a comment by 'Cobra' on 7/7 to this list on the same subject. My note asked for changes (1) and (2). Cobra requested (1). All the best, Ashok > -----Original Message----- > From: www-xml-schema-comments-request@w3.org [mailto:www-xml-schema- > comments-request@w3.org] On Behalf Of C. M. Sperberg-McQueen > Sent: Thursday, July 10, 2003 2:22 PM > To: W3C XML Schema Comments list > Subject: Question about metacharacters, regex rule 10, 24 (Datatypes > appendix F) > > > Appendix F in the Part 2 of XML Schema 1.0 defines 'metacharacter' > thus: > > A metacharacter is either ., \, ?, *, +, {, } (, ), [ or ]. > > It defines 'normal character' thus: > > [Definition:] A normal character is any XML character that is not a > metacharacter. In regular expressions, a normal character is an > atom that denotes the singleton set of strings containing only > itself. > > Production [10], which I take to be defining normal characters, reads: > > Normal Character > [10] Char ::= [^.\?*+()|#x5B#x5D] > > The metacharacters all need escapes, so production 24 is also relevant > here: > > Single Character Escape > [24] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E] > > I have some questions: > > (1) shouldn't { and } (braces) be included in production [10]? > > ? [10] Char ::= [^.\?*+{}()|#x5B#x5D] > > (2) shouldn't | (vertical bar) be among the characters defined as > metacharacters? > > (3) should ^ (#x5E) be included among the metacharacters? > > (4) would it be possible to list the magic characters in the same > order in 10 and 24, to make eyeball-based comparisons easier? > > I suspect the answer to (2) is 'yes' and the answer to (3) is 'no, on > the theory that the term 'metacharacter' is best reserved for > characters which have special meaning at the top level of a regular > expression and which must therefore have escapes to avoid ambiguity. > Hyphen, circumflex, comma, n, r, and t all have special meaning only > in special contexts (within character groups, within quantity-range > specifications, or after backslash), and so aren't metacharacters in > this sense. > > But I may be wrong. > > -CMSMcQ >
Received on Thursday, 10 July 2003 17:33:50 UTC