W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > October to December 2000

RE: Regex comments

From: Biron,Paul V <Paul.V.Biron@KP.ORG>
Date: Tue, 5 Dec 2000 11:06:34 -0800
Message-Id: <376E771642C1D2118DC300805FEAAF43014BA828@pars-exch-1.ca.kp.org>
To: "'James Clark'" <jjc@jclark.com>, www-xml-schema-comments@w3.org
> -----Original Message-----
> From:	James Clark [SMTP:jjc@jclark.com]
> Sent:	Tuesday, December 05, 2000 4:50 AM
> To:	www-xml-schema-comments@w3.org
> Subject:	Re: Regex comments
> 
> > From: James Clark (jjc@jclark.com)
> > Date: Tue, Dec 05 2000
> > 
> > which suggests that a character class subtraction looks like:
> > 
> >  [abc-[def]]
> > 
> > If this is right, it's deeply confusing that the description of \w uses
> > an incompatible syntax: [...]-[...].  It is also a pretty bizarre
> > feature: is this really necessary? I couldn't find any mention of it in
> > the Regexp documentation I consulted.
> 
> I found it in UTR#18, so I withdraw this comment. (The comment about the
> description of \w still stands.)
> 
Good, glad you found it, 'cause I was going to refer you to Unicode
Technical Report #18, Unicde Regular Expression Guidelines, section 2.3 [1].

And yes, there is a typo in the description of \w and I will change that, so
instead of:

	\w		[#x0000-#x10FFFF]-[\p{P}\p{S}\p{C}]

it will read

	\w		[#x0000-#x10FFFF-[\p{P}\p{S}\p{C}]]

pvb

References
[1] http://www.unicode.org/unicode/reports/tr18/#Subtraction
Received on Tuesday, 5 December 2000 14:22:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 December 2009 18:12:49 GMT