I've written some Java code that translates from the syntax of XSD regexes to the syntax of JDK 1.4 java.util.regex regexes. The source, binaries and documentation can be downloaded from: http://www.thaiopensource.com/relaxng/xsdregex.zip I am releasing it under a very liberal license (the BSD license), which makes it free even for commercial use. XSD regexes are based on Perl regexes and JDK 1.4 regexes are based on Perl regexes, so you might think the translation would be trivial. However, it turns out that doing a 100% job is quite tricky. In particular, JDK 1.4 regexes deal with sequences of 16-bit code values, whereas as XSD regexes deal with characters. Also JDK 1.4 supports Unicode 3.0, whereas XSD requires at least Unicode 3.1. So, for example, something as simple as \p{L} in XSD would be equivalent to the following JDK regex: ([\p{L}\u03F5\u03F4]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD800[\uDF00-\uDF1E\uD F30-\uDF49]|\uD801[\uDC00-\uDC25\uDC28-\uDC4D]|\uD835[\uDC00-\uDC54\uDC56-\ uDC9C\uDC9E-\uDC9F\uDCA2\uDCA5-\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD -\uDCC0\uDCC2-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uD D1E-\uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA3\uDEA8- \uDEC0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF 50-\uDF6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFC9]|\uD869[\uDC0 0-\uDED6]|\uD87E[\uDC00-\uDE1D]) Please report any bugs you find to me directly. JamesReceived on Tuesday, 30 April 2002 05:50:35 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 16 March 2009 11:13:22 GMT