W3C home > Mailing lists > Public > uri@w3.org > May 2003

Re: Syntax Issues/Experience with RFC2396bis and abnf2re

From: Roy T. Fielding <fielding@apache.org>
Date: Tue, 6 May 2003 13:18:52 -0700
Cc: uri@w3.org
To: Rob Cameron <cameron@cs.sfu.ca>
Message-Id: <F447617A-7FFF-11D7-8C85-000393753936@apache.org>

On Monday, May 5, 2003, at 06:33  PM, Rob Cameron wrote:
> I've been working on a new project called abnf2re
> - automatically generating regular expressions from ABNF
> grammars.
>
> A goal of the project is to help debug specifications
> as they are being developed, so I am using RFC2396bis-01
> as an initial test case.  Another goal is to generate
> expressions that can be the basis of validating parsers
> or recognizers.

That's nice.  I expect a lot of changes in the ABNF over the next two
weeks in order to simplify the grammar productions and remove the bits
that are no longer applicable.

> For example, given the URI regexps (44 in all) generated by abnf2re,
> one can then write a URI-reference parser to produce (scheme, auth,
> path, query, fragment) 5-tuples in about 22 lines of Python.
> With another 50 or so lines (buildURI - 8 lines, merge - 17 lines
> and resolve_relative_URI - 25 lines) the algorithms of
> section 5.2 can be implemented and tested.
>
> Anyway, here's an initial list of issues.
>
> (1)  There are minor ANBF bugs in these productions (ref RFC2234)
>        domainlabel   = alphanum [ 0*61( alphanum | "-" ) alphanum ]
>        toplabel      = alpha    [ 0*61( alphanum | "-" ) alphanum ]
>
>      Fixes:
>        domainlabel   = alphanum [ 0*61( alphanum / "-" ) alphanum ]
>        toplabel      = ALPHA    [ 0*61( alphanum / "-" ) alphanum ]

Thanks, fixed.

> (2)  The HEXDIG definition of RFC2234 is upper-case only;
>      it's probably not what is wanted.
>        escaped     = "%" HEXDIG HEXDIG

As Graham noted, strings are case-insensitive.

> (3)  The production rule for path is a bit problematic.
>        path          = [ abs-path / opaque-part ]
>      - it is not used in the grammar
>      - presumably, it is meant to say that whatever is
>         parsed as either abs-path or opaque-part is interpreted
>         as a "path".
>      - the production does not include rel-path, but rel-path needs
>         to be processed as a path for the algorithms in 5.2

That will be going away soon.

> (4)  In my implementation, I've assumed the following change in
>      the pseudocode for the algorithm in 5.2
>
>          if (R.path == "") then
>             if defined(R.query) then
>                T.path  = Base.path;
>                T.query = R.query;
>             else
>                -- An empty reference refers to the current document
>                return (current-document, fragment);
>             endif;
>
>      becomes
>          if (R.path == "") then
>             T.path  = Base.path;
>             if defined(R.query) then
>                T.query = R.query;
>             else
>                T.query = Base.query;
>             endif;
>
>       This seems consistent with the requests of the RDF group and
>       gives a clean, well-behaved algorithm.

Yes, that is in the works as well, though I won't make it until all of
the changes to the text can be made at once.

....Roy
Received on Tuesday, 6 May 2003 16:34:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:31 GMT