- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Thu, 16 Nov 2006 10:34:40 -0700
- To: Michael Kay <mike@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, <public-xml-schema-testsuite@w3.org>
On 16 Nov 2006, at 08:51 , Michael Kay wrote:
>
> Fine, I'll keep my powder dry.
>
> FWIW, about 600 of my 1500 discrepancies are in the regex area
> which is why
> I'm tackling that first. All but 24 are cases where the expected
> result is
> "valid" and Saxon says "invalid". Some of these are due to the
> continued
> lack of clarity in the rules for handling hyphens. Many of them are
> because
> the test results are just plain wrong, for example I've found the
> following
> being classed as valid:
>
> x{,2}
> [\u0554-\u0557]+
> \p{Nd}{4}-\[{Nd}{2}
> [^a-f-[\x00-\x60\u007B-\uFFFF]]+
> \p{klsak
>
> To be honest, I'm wondering what the best way of tackling these is.
> Going
> through 600 cases by hand to check whether they conform to the
> regex grammar
> doesn't sound like much fun. There must be a better way, like
> writing a
> JavaCC parser to automate the checking.
I did write a parser for the regex language, some time ago (by which I
mean that I copied the grammar into Prolog and modified the syntax
as needed, and then rewrote some bits to get rid of left recursion).
I'll make a note to dig it out and figure out how to use it to check
these
test cases and see what it says.
Michael
Received on Thursday, 16 November 2006 18:58:29 UTC