- From: James Clark <jjc@jclark.com>
- Date: Tue, 30 Apr 2002 16:52:53 +0700
- To: xmlschema-dev@w3.org
I've written some Java code that translates from the syntax of XSD regexes to the syntax of JDK 1.4 java.util.regex regexes. The source, binaries and documentation can be downloaded from: http://www.thaiopensource.com/relaxng/xsdregex.zip I am releasing it under a very liberal license (the BSD license), which makes it free even for commercial use. XSD regexes are based on Perl regexes and JDK 1.4 regexes are based on Perl regexes, so you might think the translation would be trivial. However, it turns out that doing a 100% job is quite tricky. In particular, JDK 1.4 regexes deal with sequences of 16-bit code values, whereas as XSD regexes deal with characters. Also JDK 1.4 supports Unicode 3.0, whereas XSD requires at least Unicode 3.1. So, for example, something as simple as \p{L} in XSD would be equivalent to the following JDK regex: ([\p{L}\u03F5\u03F4]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD800[\uDF00-\uDF1E\uD F30-\uDF49]|\uD801[\uDC00-\uDC25\uDC28-\uDC4D]|\uD835[\uDC00-\uDC54\uDC56-\ uDC9C\uDC9E-\uDC9F\uDCA2\uDCA5-\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD -\uDCC0\uDCC2-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uD D1E-\uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA3\uDEA8- \uDEC0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF 50-\uDF6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFC9]|\uD869[\uDC0 0-\uDED6]|\uD87E[\uDC00-\uDE1D]) Please report any bugs you find to me directly. James
Received on Tuesday, 30 April 2002 05:50:35 UTC