- From: Graham Klyne <gk@ninebynine.org>
- Date: Tue, 27 Jan 2004 20:25:51 +0000
- To: uri@w3.org
I've been re-implementing my URI parser based very closely on the latest
RFC2396bis document. Here are my comments.
The syntax has been taken from the very latest work-in-progress version of
the document (CVS revision 1.64, from
http://cvs.apache.org/viewcvs.cgi/ietf-uri/rev-2002/
Overall, I think it's looking pretty good.
...
Test case: "http://example.org/aaa/bbb#ccc"
This parses as a valid relative-URI, because the syntax for hier-part ->
rel-path -> segments allows the first segment to contain a ':' character.
Later: I see this is covered by text in section 4.1. I suggest also
adding a note to the paragraph in section 3 following the rel-path
production: "The first segment in a rel-path may not match the 'scheme'
and ':' syntax".
...
Test case: "http://example.123./aaa/bbb#ccc"
I'd like to confirm that this is now regarded as a valid URI. In previous
versions of this specification, it was not (according to my interpretation
and implementation).
I note that getting the parsing (look-ahead) logic for this just right has
turned out to be a bit tricky. When parsing a 'qualified' production, if a
leading '.' is not followed by a domainlabel, it must be re-interpreted as
a trailing '.'.
I think this production might be easier as a basis for implementation:
qualified = [ "." [ domainlabel qualified ] ]
(I can't see how to express this using repetition rather than recursion.)
<aside>
Derivation of above:
qualified = *( "." domainlabel ) [ "." ]
...> [ 1*( "." domainlabel ) / "." / 1*( "." domainlabel ) "." ]
...> [ "." domainlabel *( "." domainlabel )
/ "."
/ "." domainlabel *( "." domainlabel ) "." ]
...> [ "." [ domainlabel *( "." domainlabel )
/ domainlabel *( "." domainlabel ) "." ] ]
...> [ "." [ domainlabel *( "." domainlabel ) [ "." ] ]
...> [ "." [ domainlabel qualified ]
</aside>
...
I note that getting the parsing (look-ahead) logic for the ipv6literal just
right has been quite tricky, mainly when parsing phrases of the form (h4
":"), it is important to ensure that the following character is not another
':'.
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Tuesday, 27 January 2004 16:30:08 UTC