RE: Question about metacharacters, regex rule 10, 24 (Datatypes appendix F)

Michael:
How did you run across this?  I'm just curious because I sent at least
one (possibly two) note about it and there was a comment by 'Cobra' on
7/7 to this list on the same subject.

My note asked for changes (1) and (2).  Cobra requested (1).

All the best, Ashok

> -----Original Message-----
> From: www-xml-schema-comments-request@w3.org [mailto:www-xml-schema-
> comments-request@w3.org] On Behalf Of C. M. Sperberg-McQueen
> Sent: Thursday, July 10, 2003 2:22 PM
> To: W3C XML Schema Comments list
> Subject: Question about metacharacters, regex rule 10, 24 (Datatypes
> appendix F)
> 
> 
> Appendix F in the Part 2 of XML Schema 1.0 defines 'metacharacter'
> thus:
> 
>    A metacharacter is either ., \, ?, *, +, {, } (, ), [ or ].
> 
> It defines 'normal character' thus:
> 
>    [Definition:] A normal character is any XML character that is not a
>    metacharacter. In regular expressions, a normal character is an
>    atom that denotes the singleton set of strings containing only
>    itself.
> 
> Production [10], which I take to be defining normal characters, reads:
> 
>    Normal Character
>    [10]  Char ::= [^.\?*+()|#x5B#x5D]
> 
> The metacharacters all need escapes, so production 24 is also relevant
> here:
> 
>    Single Character Escape
>    [24] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
> 
> I have some questions:
> 
> (1) shouldn't { and } (braces) be included in production [10]?
> 
>    ? [10] Char ::= [^.\?*+{}()|#x5B#x5D]
> 
> (2) shouldn't | (vertical bar) be among the characters defined as
> metacharacters?
> 
> (3) should ^ (#x5E) be included among the metacharacters?
> 
> (4) would it be possible to list the magic characters in the same
> order in 10 and 24, to make eyeball-based comparisons easier?
> 
> I suspect the answer to (2) is 'yes' and the answer to (3) is 'no, on
> the theory that the term 'metacharacter' is best reserved for
> characters which have special meaning at the top level of a regular
> expression and which must therefore have escapes to avoid ambiguity.
> Hyphen, circumflex, comma, n, r, and t all have special meaning only
> in special contexts (within character groups, within quantity-range
> specifications, or after backslash), and so aren't metacharacters in
> this sense.
> 
> But I may be wrong.
> 
> -CMSMcQ
> 

Received on Thursday, 10 July 2003 17:33:50 UTC