- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Thu, 16 Nov 2006 10:34:40 -0700
- To: Michael Kay <mike@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, <public-xml-schema-testsuite@w3.org>
On 16 Nov 2006, at 08:51 , Michael Kay wrote: > > Fine, I'll keep my powder dry. > > FWIW, about 600 of my 1500 discrepancies are in the regex area > which is why > I'm tackling that first. All but 24 are cases where the expected > result is > "valid" and Saxon says "invalid". Some of these are due to the > continued > lack of clarity in the rules for handling hyphens. Many of them are > because > the test results are just plain wrong, for example I've found the > following > being classed as valid: > > x{,2} > [\u0554-\u0557]+ > \p{Nd}{4}-\[{Nd}{2} > [^a-f-[\x00-\x60\u007B-\uFFFF]]+ > \p{klsak > > To be honest, I'm wondering what the best way of tackling these is. > Going > through 600 cases by hand to check whether they conform to the > regex grammar > doesn't sound like much fun. There must be a better way, like > writing a > JavaCC parser to automate the checking. I did write a parser for the regex language, some time ago (by which I mean that I copied the grammar into Prolog and modified the syntax as needed, and then rewrote some bits to get rid of left recursion). I'll make a note to dig it out and figure out how to use it to check these test cases and see what it says. Michael
Received on Thursday, 16 November 2006 18:58:29 UTC