- From: Rob Cameron <cameron@cs.sfu.ca>
- Date: Mon, 5 May 2003 18:33:39 -0700 (PDT)
- To: uri@w3.org
I've been working on a new project called abnf2re - automatically generating regular expressions from ABNF grammars. A goal of the project is to help debug specifications as they are being developed, so I am using RFC2396bis-01 as an initial test case. Another goal is to generate expressions that can be the basis of validating parsers or recognizers. For example, given the URI regexps (44 in all) generated by abnf2re, one can then write a URI-reference parser to produce (scheme, auth, path, query, fragment) 5-tuples in about 22 lines of Python. With another 50 or so lines (buildURI - 8 lines, merge - 17 lines and resolve_relative_URI - 25 lines) the algorithms of section 5.2 can be implemented and tested. Anyway, here's an initial list of issues. (1) There are minor ANBF bugs in these productions (ref RFC2234) domainlabel = alphanum [ 0*61( alphanum | "-" ) alphanum ] toplabel = alpha [ 0*61( alphanum | "-" ) alphanum ] Fixes: domainlabel = alphanum [ 0*61( alphanum / "-" ) alphanum ] toplabel = ALPHA [ 0*61( alphanum / "-" ) alphanum ] (2) The HEXDIG definition of RFC2234 is upper-case only; it's probably not what is wanted. escaped = "%" HEXDIG HEXDIG (3) The production rule for path is a bit problematic. path = [ abs-path / opaque-part ] - it is not used in the grammar - presumably, it is meant to say that whatever is parsed as either abs-path or opaque-part is interpreted as a "path". - the production does not include rel-path, but rel-path needs to be processed as a path for the algorithms in 5.2 (4) In my implementation, I've assumed the following change in the pseudocode for the algorithm in 5.2 if (R.path == "") then if defined(R.query) then T.path = Base.path; T.query = R.query; else -- An empty reference refers to the current document return (current-document, fragment); endif; becomes if (R.path == "") then T.path = Base.path; if defined(R.query) then T.query = R.query; else T.query = Base.query; endif; This seems consistent with the requests of the RDF group and gives a clean, well-behaved algorithm.
Received on Monday, 5 May 2003 21:35:25 UTC