On Mon, 9 Jan 2006, Bjoern Hoehrmann wrote: > I tried http://bisqwit.iki.fi/source/regexopt.html and so far I like > it! Thanks for doing this. I noticed some issues though: in GetDecMask > it would probably be better to call the set() method rather than using > the operator[] reference. Thank you for your feedback! Your feedback was very useful, but I fear I lack the expertise required to make regexps work with unicode. I've created some character encoding -related software, but I don't have expertise on locales and perl specifically. I would appreciate it, if you can provide a crash-course on how unicode works _with regexps_, and I can then look at it. Most importantly, what is the proper way to implement \w and its cousins. I already know how UTF-8 works and what kind of characters the unicode consists of (http://bisqwit.iki.fi/japtools/unicodemap.php), but I realize that regexps aren't necessarily always UTF-8 -encoded. I've written plenty of ISO-8859-* -encoded regexps, which would fail parsing as UTF-8. Also, I'm interested of your unicode bitset. I could easily use std::bitset<0x110000> instead of std::bitset<0x100>, but then it would use 139264 bytes of memory per instance instead of 32, which wouldn't be so nice... Creating a lib is a possibility and a good idea. I'll probably do it in the next version. -- Joel Yliluoma http://iki.fi/bisqwit/Received on Tuesday, 10 January 2006 09:45:49 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 July 2008 08:09:42 GMT