Re: Syntax Issues/Experience with RFC2396bis and abnf2re

On Monday, May 5, 2003, at 06:33  PM, Rob Cameron wrote:
> I've been working on a new project called abnf2re
> - automatically generating regular expressions from ABNF
> grammars.
>
> A goal of the project is to help debug specifications
> as they are being developed, so I am using RFC2396bis-01
> as an initial test case.  Another goal is to generate
> expressions that can be the basis of validating parsers
> or recognizers.

That's nice.  I expect a lot of changes in the ABNF over the next two
weeks in order to simplify the grammar productions and remove the bits
that are no longer applicable.

> For example, given the URI regexps (44 in all) generated by abnf2re,
> one can then write a URI-reference parser to produce (scheme, auth,
> path, query, fragment) 5-tuples in about 22 lines of Python.
> With another 50 or so lines (buildURI - 8 lines, merge - 17 lines
> and resolve_relative_URI - 25 lines) the algorithms of
> section 5.2 can be implemented and tested.
>
> Anyway, here's an initial list of issues.
>
> (1)  There are minor ANBF bugs in these productions (ref RFC2234)
>        domainlabel   = alphanum [ 0*61( alphanum | "-" ) alphanum ]
>        toplabel      = alpha    [ 0*61( alphanum | "-" ) alphanum ]
>
>      Fixes:
>        domainlabel   = alphanum [ 0*61( alphanum / "-" ) alphanum ]
>        toplabel      = ALPHA    [ 0*61( alphanum / "-" ) alphanum ]

Thanks, fixed.

> (2)  The HEXDIG definition of RFC2234 is upper-case only;
>      it's probably not what is wanted.
>        escaped     = "%" HEXDIG HEXDIG

As Graham noted, strings are case-insensitive.

> (3)  The production rule for path is a bit problematic.
>        path          = [ abs-path / opaque-part ]
>      - it is not used in the grammar
>      - presumably, it is meant to say that whatever is
>         parsed as either abs-path or opaque-part is interpreted
>         as a "path".
>      - the production does not include rel-path, but rel-path needs
>         to be processed as a path for the algorithms in 5.2

That will be going away soon.

> (4)  In my implementation, I've assumed the following change in
>      the pseudocode for the algorithm in 5.2
>
>          if (R.path == "") then
>             if defined(R.query) then
>                T.path  = Base.path;
>                T.query = R.query;
>             else
>                -- An empty reference refers to the current document
>                return (current-document, fragment);
>             endif;
>
>      becomes
>          if (R.path == "") then
>             T.path  = Base.path;
>             if defined(R.query) then
>                T.query = R.query;
>             else
>                T.query = Base.query;
>             endif;
>
>       This seems consistent with the requests of the RDF group and
>       gives a clean, well-behaved algorithm.

Yes, that is in the works as well, though I won't make it until all of
the changes to the text can be made at once.

....Roy

Received on Tuesday, 6 May 2003 16:34:31 UTC