Re: Bug tracking

On 16 Nov 2006, at 08:51 , Michael Kay wrote:

>
> Fine, I'll keep my powder dry.
>
> FWIW, about 600 of my 1500 discrepancies are in the regex area  
> which is why
> I'm tackling that first. All but 24 are cases where the expected  
> result is
> "valid" and Saxon says "invalid". Some of these are due to the  
> continued
> lack of clarity in the rules for handling hyphens. Many of them are  
> because
> the test results are just plain wrong, for example I've found the  
> following
> being classed as valid:
>
> x{,2}
> [\u0554-\u0557]+
> \p{Nd}{4}-\[{Nd}{2}
> [^a-f-[\x00-\x60\u007B-\uFFFF]]+
> \p{klsak
>
> To be honest, I'm wondering what the best way of tackling these is.  
> Going
> through 600 cases by hand to check whether they conform to the  
> regex grammar
> doesn't sound like much fun. There must be a better way, like  
> writing a
> JavaCC parser to automate the checking.

I did write a parser for the regex language, some time ago (by which I
mean that I copied the grammar into Prolog and modified the syntax
as needed, and then rewrote some bits to get rid of left recursion).

I'll make a note to dig it out and figure out how to use it to check  
these
test cases and see what it says.

Michael

Received on Thursday, 16 November 2006 18:58:29 UTC