Re: Comments on regex-opt

On Mon, 16 Jan 2006, Bjoern Hoehrmann wrote:
> I since ran into some other issues. It'd be good to have these at least
> in the documentation if not fixed (so others might look at it and con-
> tribute a patch or two).

Thanks.


> A re like x{0,4} is rewritten to x{,4}; the latter syntax is not widely
> supported, e.g. Perl would not treat this as quantifier but as literal.

I didn't know this.
Thanks, it will be fixed in the next version, 1.1.1.


> A re like [a-z]+|[a-z]+ is rewritten to (?:|)[a-z]+; this should really
> be [a-z]+ instead.

This also will be fixed in the next version, 1.1.1.


> A re like foo|[a-z]+ comes out as (?:foo|[a-z]+); this could be further
> optimized to simply [a-z]+. This is like "Choice counting" which is
> already listed, but it'd be good to have this example in the docs, I
> think. The Perl module Regexp::Optimizer reduces (?:aa|a)b to aa?b but
> does not do this for (?:foo|[a-z]+).

Yes, this is the choice counting problem.

I should think of a way to make the program check if an alternative
is a subset of another alternative, and thus combine them if they are.


> It'd be handy to have control about what . is equivalent to, e.g. in
> Perl with the 's' modifier it's really any character, in XML Schema's
> regular expression language it's [^\r\n], etc.

Yes, it'd be nice...

-- 
Joel Yliluoma
http://iki.fi/bisqwit/

Received on Monday, 16 January 2006 13:02:00 UTC