- From: Roy T. Fielding <fielding@apache.org>
- Date: Tue, 6 May 2003 13:18:52 -0700
- To: Rob Cameron <cameron@cs.sfu.ca>
- Cc: uri@w3.org
On Monday, May 5, 2003, at 06:33 PM, Rob Cameron wrote: > I've been working on a new project called abnf2re > - automatically generating regular expressions from ABNF > grammars. > > A goal of the project is to help debug specifications > as they are being developed, so I am using RFC2396bis-01 > as an initial test case. Another goal is to generate > expressions that can be the basis of validating parsers > or recognizers. That's nice. I expect a lot of changes in the ABNF over the next two weeks in order to simplify the grammar productions and remove the bits that are no longer applicable. > For example, given the URI regexps (44 in all) generated by abnf2re, > one can then write a URI-reference parser to produce (scheme, auth, > path, query, fragment) 5-tuples in about 22 lines of Python. > With another 50 or so lines (buildURI - 8 lines, merge - 17 lines > and resolve_relative_URI - 25 lines) the algorithms of > section 5.2 can be implemented and tested. > > Anyway, here's an initial list of issues. > > (1) There are minor ANBF bugs in these productions (ref RFC2234) > domainlabel = alphanum [ 0*61( alphanum | "-" ) alphanum ] > toplabel = alpha [ 0*61( alphanum | "-" ) alphanum ] > > Fixes: > domainlabel = alphanum [ 0*61( alphanum / "-" ) alphanum ] > toplabel = ALPHA [ 0*61( alphanum / "-" ) alphanum ] Thanks, fixed. > (2) The HEXDIG definition of RFC2234 is upper-case only; > it's probably not what is wanted. > escaped = "%" HEXDIG HEXDIG As Graham noted, strings are case-insensitive. > (3) The production rule for path is a bit problematic. > path = [ abs-path / opaque-part ] > - it is not used in the grammar > - presumably, it is meant to say that whatever is > parsed as either abs-path or opaque-part is interpreted > as a "path". > - the production does not include rel-path, but rel-path needs > to be processed as a path for the algorithms in 5.2 That will be going away soon. > (4) In my implementation, I've assumed the following change in > the pseudocode for the algorithm in 5.2 > > if (R.path == "") then > if defined(R.query) then > T.path = Base.path; > T.query = R.query; > else > -- An empty reference refers to the current document > return (current-document, fragment); > endif; > > becomes > if (R.path == "") then > T.path = Base.path; > if defined(R.query) then > T.query = R.query; > else > T.query = Base.query; > endif; > > This seems consistent with the requests of the RDF group and > gives a clean, well-behaved algorithm. Yes, that is in the works as well, though I won't make it until all of the changes to the text can be made at once. ....Roy
Received on Tuesday, 6 May 2003 16:34:31 UTC