- From: Biron,Paul V <Paul.V.Biron@kp.org>
- Date: Thu, 26 Jul 2001 15:08:04 -0700
- To: "'www-xml-schema-comments'" <www-xml-schema-comments@w3.org>
> -----Original Message----- > From: Biron,Paul V [SMTP:Paul.V.Biron@kp.org] > Sent: Wednesday, July 25, 2001 4:41 PM > To: 'www-xml-schema-comments' > Subject: errata/clarification for regex language > > It appears that we were not explicit enough in our description of the > regex > language in Appendix F [1]. > > Our intension was to follow exactly 2 aspects of Perl's matching > algorithm: > > 1. the "earliest" match wins...that is, since the string is scanned > left-to-right, the match that begins closest to the start of the string > "wins" > 2. the "greediest" match wins...that is, the longest substring that can > possibly match (given #1 above) wins. > > I think this cna be considered a clarification rather than a change, but > will leave it up to the WG (I'm especially interested in hearing from > implementors to see if they have implemented something different). > I also left off one other item. It should be clear (famous last words :-) that the way we have defined regex's there is an implicit "head and tail" anchoring added to every pattern. That is, every pattern p in our language is equivalent to the pattern ^p$ in Perl or other similar regex languages. We (the task force that designed the regex language) made this decision very conciously, since we felt that in ALMOST EVERY concievable case, someone using pattern to restrict the lexical space of a type would want the implicit anchoring...and we felt that it would be burdonsome for them to add the extra metacharacters. However, it probably wouldn't hurt to add a note to this effect, possibly with an example of how to get the "substring" matching behavior that is the default in perl (i.e., instead of p, one would write .*p.*). pvb
Received on Thursday, 26 July 2001 18:30:40 UTC