ANNOUNCE: Translator from XSD regex syntax to Java regex syntax

I've written some Java code that translates from the syntax of XSD regexes 
to the syntax of JDK 1.4 java.util.regex regexes.  The source, binaries and 
documentation can be downloaded from:

  http://www.thaiopensource.com/relaxng/xsdregex.zip

I am releasing it under a very liberal license (the BSD license), which 
makes it free even for commercial use.

XSD regexes are based on Perl regexes and JDK 1.4 regexes are based on Perl 
regexes, so you might think the translation would be trivial.  However, it 
turns out that doing a 100% job is quite tricky.  In particular, JDK 1.4 
regexes deal with sequences of 16-bit code values, whereas as XSD regexes 
deal with characters.  Also JDK 1.4 supports Unicode 3.0, whereas XSD 
requires at least Unicode 3.1.  So, for example, something as simple as 
\p{L} in XSD would be equivalent to the following JDK regex:

([\p{L}\u03F5\u03F4]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD800[\uDF00-\uDF1E\uD
F30-\uDF49]|\uD801[\uDC00-\uDC25\uDC28-\uDC4D]|\uD835[\uDC00-\uDC54\uDC56-\
uDC9C\uDC9E-\uDC9F\uDCA2\uDCA5-\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD
-\uDCC0\uDCC2-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uD
D1E-\uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA3\uDEA8-
\uDEC0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF
50-\uDF6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFC9]|\uD869[\uDC0
0-\uDED6]|\uD87E[\uDC00-\uDE1D])

Please report any bugs you find to me directly.

James

Received on Tuesday, 30 April 2002 05:50:35 UTC