- From: Joel Yliluoma <bisqwit@iki.fi>
- Date: Mon, 16 Jan 2006 15:04:22 +0200 (EET)
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- cc: www-archive@w3.org
On Mon, 16 Jan 2006, Bjoern Hoehrmann wrote:
> I since ran into some other issues. It'd be good to have these at least
> in the documentation if not fixed (so others might look at it and con-
> tribute a patch or two).
Thanks.
> A re like x{0,4} is rewritten to x{,4}; the latter syntax is not widely
> supported, e.g. Perl would not treat this as quantifier but as literal.
I didn't know this.
Thanks, it will be fixed in the next version, 1.1.1.
> A re like [a-z]+|[a-z]+ is rewritten to (?:|)[a-z]+; this should really
> be [a-z]+ instead.
This also will be fixed in the next version, 1.1.1.
> A re like foo|[a-z]+ comes out as (?:foo|[a-z]+); this could be further
> optimized to simply [a-z]+. This is like "Choice counting" which is
> already listed, but it'd be good to have this example in the docs, I
> think. The Perl module Regexp::Optimizer reduces (?:aa|a)b to aa?b but
> does not do this for (?:foo|[a-z]+).
Yes, this is the choice counting problem.
I should think of a way to make the program check if an alternative
is a subset of another alternative, and thus combine them if they are.
> It'd be handy to have control about what . is equivalent to, e.g. in
> Perl with the 's' modifier it's really any character, in XML Schema's
> regular expression language it's [^\r\n], etc.
Yes, it'd be nice...
--
Joel Yliluoma
http://iki.fi/bisqwit/
Received on Monday, 16 January 2006 13:02:00 UTC