W3C home > Mailing lists > Public > www-archive@w3.org > January 2006

Re: Comments on regex-opt

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 16 Jan 2006 10:51:23 +0100
To: Joel Yliluoma <bisqwit@iki.fi>
Cc: www-archive@w3.org
Message-ID: <9jpms1pt33ut1m0l7np48uj2qrtfm1btbj@hive.bjoern.hoehrmann.de>

* Joel Yliluoma wrote:
>Thank you for your feedback!

I since ran into some other issues. It'd be good to have these at least
in the documentation if not fixed (so others might look at it and con-
tribute a patch or two).

A re like x{0,4} is rewritten to x{,4}; the latter syntax is not widely
supported, e.g. Perl would not treat this as quantifier but as literal.

A re like [a-z]+|[a-z]+ is rewritten to (?:|)[a-z]+; this should really
be [a-z]+ instead.

A re like foo|[a-z]+ comes out as (?:foo|[a-z]+); this could be further
optimized to simply [a-z]+. This is like "Choice counting" which is
already listed, but it'd be good to have this example in the docs, I
think. The Perl module Regexp::Optimizer reduces (?:aa|a)b to aa?b but
does not do this for (?:foo|[a-z]+).

It'd be handy to have control about what . is equivalent to, e.g. in
Perl with the 's' modifier it's really any character, in XML Schema's
regular expression language it's [^\r\n], etc.

(There are a few more things, but I wanted to write this up so I don't
forget...)

regards,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Monday, 16 January 2006 09:50:46 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:42:55 UTC