- From: Joel Yliluoma <bisqwit@iki.fi>
- Date: Mon, 9 Jan 2006 14:46:42 +0200 (EET)
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- cc: www-archive@w3.org
On Mon, 9 Jan 2006, Bjoern Hoehrmann wrote: > I tried http://bisqwit.iki.fi/source/regexopt.html and so far I like > it! Thanks for doing this. I noticed some issues though: in GetDecMask > it would probably be better to call the set() method rather than using > the operator[] reference. Thank you for your feedback! Your feedback was very useful, but I fear I lack the expertise required to make regexps work with unicode. I've created some character encoding -related software, but I don't have expertise on locales and perl specifically. I would appreciate it, if you can provide a crash-course on how unicode works _with regexps_, and I can then look at it. Most importantly, what is the proper way to implement \w and its cousins. I already know how UTF-8 works and what kind of characters the unicode consists of (http://bisqwit.iki.fi/japtools/unicodemap.php), but I realize that regexps aren't necessarily always UTF-8 -encoded. I've written plenty of ISO-8859-* -encoded regexps, which would fail parsing as UTF-8. Also, I'm interested of your unicode bitset. I could easily use std::bitset<0x110000> instead of std::bitset<0x100>, but then it would use 139264 bytes of memory per instance instead of 32, which wouldn't be so nice... Creating a lib is a possibility and a good idea. I'll probably do it in the next version. -- Joel Yliluoma http://iki.fi/bisqwit/
Received on Tuesday, 10 January 2006 09:45:49 UTC