W3C home > Mailing lists > Public > uri@w3.org > May 2003

Syntax Issues/Experience with RFC2396bis and abnf2re

From: Rob Cameron <cameron@cs.sfu.ca>
Date: Mon, 5 May 2003 18:33:39 -0700 (PDT)
Message-Id: <200305060133.h461Xdu27683@orpheus.cs.sfu.ca>
To: uri@w3.org

I've been working on a new project called abnf2re
- automatically generating regular expressions from ABNF 
grammars.

A goal of the project is to help debug specifications
as they are being developed, so I am using RFC2396bis-01
as an initial test case.  Another goal is to generate
expressions that can be the basis of validating parsers
or recognizers.  

For example, given the URI regexps (44 in all) generated by abnf2re,
one can then write a URI-reference parser to produce (scheme, auth, 
path, query, fragment) 5-tuples in about 22 lines of Python.
With another 50 or so lines (buildURI - 8 lines, merge - 17 lines 
and resolve_relative_URI - 25 lines) the algorithms of
section 5.2 can be implemented and tested.

Anyway, here's an initial list of issues.

(1)  There are minor ANBF bugs in these productions (ref RFC2234)
       domainlabel   = alphanum [ 0*61( alphanum | "-" ) alphanum ]
       toplabel      = alpha    [ 0*61( alphanum | "-" ) alphanum ]

     Fixes:
       domainlabel   = alphanum [ 0*61( alphanum / "-" ) alphanum ]
       toplabel      = ALPHA    [ 0*61( alphanum / "-" ) alphanum ]

(2)  The HEXDIG definition of RFC2234 is upper-case only;  
     it's probably not what is wanted.
       escaped     = "%" HEXDIG HEXDIG

(3)  The production rule for path is a bit problematic.
       path          = [ abs-path / opaque-part ]
     - it is not used in the grammar
     - presumably, it is meant to say that whatever is
        parsed as either abs-path or opaque-part is interpreted
        as a "path".
     - the production does not include rel-path, but rel-path needs
        to be processed as a path for the algorithms in 5.2

(4)  In my implementation, I've assumed the following change in
     the pseudocode for the algorithm in 5.2

         if (R.path == "") then
            if defined(R.query) then
               T.path  = Base.path;
               T.query = R.query;
            else
               -- An empty reference refers to the current document
               return (current-document, fragment);
            endif;

     becomes
         if (R.path == "") then
            T.path  = Base.path;
            if defined(R.query) then
               T.query = R.query;
            else
               T.query = Base.query;
            endif;

      This seems consistent with the requests of the RDF group and
      gives a clean, well-behaved algorithm.
Received on Monday, 5 May 2003 21:35:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:31 GMT