Re: Bug tracking from C. M. Sperberg-McQueen on 2006-11-16 (public-xml-schema-testsuite@w3.org from November 2006)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Thu, 16 Nov 2006 10:34:40 -0700
To: Michael Kay <mike@saxonica.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, <public-xml-schema-testsuite@w3.org>
Message-Id: <A378DD42-ABB2-43E7-8E8D-4862F658F25B@acm.org>

On 16 Nov 2006, at 08:51 , Michael Kay wrote:

>
> Fine, I'll keep my powder dry.
>
> FWIW, about 600 of my 1500 discrepancies are in the regex area  
> which is why
> I'm tackling that first. All but 24 are cases where the expected  
> result is
> "valid" and Saxon says "invalid". Some of these are due to the  
> continued
> lack of clarity in the rules for handling hyphens. Many of them are  
> because
> the test results are just plain wrong, for example I've found the  
> following
> being classed as valid:
>
> x{,2}
> [\u0554-\u0557]+
> \p{Nd}{4}-\[{Nd}{2}
> [^a-f-[\x00-\x60\u007B-\uFFFF]]+
> \p{klsak
>
> To be honest, I'm wondering what the best way of tackling these is.  
> Going
> through 600 cases by hand to check whether they conform to the  
> regex grammar
> doesn't sound like much fun. There must be a better way, like  
> writing a
> JavaCC parser to automate the checking.

I did write a parser for the regex language, some time ago (by which I
mean that I copied the grammar into Prolog and modified the syntax
as needed, and then rewrote some bits to get rid of left recursion).

I'll make a note to dig it out and figure out how to use it to check  
these
test cases and see what it says.

Michael

Received on Thursday, 16 November 2006 18:58:29 UTC