- From: Rob Cameron <cameron@cs.sfu.ca>
- Date: Mon, 5 May 2003 18:33:39 -0700 (PDT)
- To: uri@w3.org
I've been working on a new project called abnf2re
- automatically generating regular expressions from ABNF
grammars.
A goal of the project is to help debug specifications
as they are being developed, so I am using RFC2396bis-01
as an initial test case. Another goal is to generate
expressions that can be the basis of validating parsers
or recognizers.
For example, given the URI regexps (44 in all) generated by abnf2re,
one can then write a URI-reference parser to produce (scheme, auth,
path, query, fragment) 5-tuples in about 22 lines of Python.
With another 50 or so lines (buildURI - 8 lines, merge - 17 lines
and resolve_relative_URI - 25 lines) the algorithms of
section 5.2 can be implemented and tested.
Anyway, here's an initial list of issues.
(1) There are minor ANBF bugs in these productions (ref RFC2234)
domainlabel = alphanum [ 0*61( alphanum | "-" ) alphanum ]
toplabel = alpha [ 0*61( alphanum | "-" ) alphanum ]
Fixes:
domainlabel = alphanum [ 0*61( alphanum / "-" ) alphanum ]
toplabel = ALPHA [ 0*61( alphanum / "-" ) alphanum ]
(2) The HEXDIG definition of RFC2234 is upper-case only;
it's probably not what is wanted.
escaped = "%" HEXDIG HEXDIG
(3) The production rule for path is a bit problematic.
path = [ abs-path / opaque-part ]
- it is not used in the grammar
- presumably, it is meant to say that whatever is
parsed as either abs-path or opaque-part is interpreted
as a "path".
- the production does not include rel-path, but rel-path needs
to be processed as a path for the algorithms in 5.2
(4) In my implementation, I've assumed the following change in
the pseudocode for the algorithm in 5.2
if (R.path == "") then
if defined(R.query) then
T.path = Base.path;
T.query = R.query;
else
-- An empty reference refers to the current document
return (current-document, fragment);
endif;
becomes
if (R.path == "") then
T.path = Base.path;
if defined(R.query) then
T.query = R.query;
else
T.query = Base.query;
endif;
This seems consistent with the requests of the RDF group and
gives a clean, well-behaved algorithm.
Received on Monday, 5 May 2003 21:35:25 UTC