- From: James Clark <jjc@jclark.com>
- Date: Tue, 30 Apr 2002 16:52:53 +0700
- To: xmlschema-dev@w3.org
I've written some Java code that translates from the syntax of XSD regexes
to the syntax of JDK 1.4 java.util.regex regexes. The source, binaries and
documentation can be downloaded from:
http://www.thaiopensource.com/relaxng/xsdregex.zip
I am releasing it under a very liberal license (the BSD license), which
makes it free even for commercial use.
XSD regexes are based on Perl regexes and JDK 1.4 regexes are based on Perl
regexes, so you might think the translation would be trivial. However, it
turns out that doing a 100% job is quite tricky. In particular, JDK 1.4
regexes deal with sequences of 16-bit code values, whereas as XSD regexes
deal with characters. Also JDK 1.4 supports Unicode 3.0, whereas XSD
requires at least Unicode 3.1. So, for example, something as simple as
\p{L} in XSD would be equivalent to the following JDK regex:
([\p{L}\u03F5\u03F4]|[\uD840-\uD868][\uDC00-\uDFFF]|\uD800[\uDF00-\uDF1E\uD
F30-\uDF49]|\uD801[\uDC00-\uDC25\uDC28-\uDC4D]|\uD835[\uDC00-\uDC54\uDC56-\
uDC9C\uDC9E-\uDC9F\uDCA2\uDCA5-\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD
-\uDCC0\uDCC2-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uD
D1E-\uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA3\uDEA8-
\uDEC0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF
50-\uDF6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFC9]|\uD869[\uDC0
0-\uDED6]|\uD87E[\uDC00-\uDE1D])
Please report any bugs you find to me directly.
James
Received on Tuesday, 30 April 2002 05:50:35 UTC