- From: Graham Klyne <gk@ninebynine.org>
- Date: Tue, 27 Jan 2004 20:25:51 +0000
- To: uri@w3.org
I've been re-implementing my URI parser based very closely on the latest RFC2396bis document. Here are my comments. The syntax has been taken from the very latest work-in-progress version of the document (CVS revision 1.64, from http://cvs.apache.org/viewcvs.cgi/ietf-uri/rev-2002/ Overall, I think it's looking pretty good. ... Test case: "http://example.org/aaa/bbb#ccc" This parses as a valid relative-URI, because the syntax for hier-part -> rel-path -> segments allows the first segment to contain a ':' character. Later: I see this is covered by text in section 4.1. I suggest also adding a note to the paragraph in section 3 following the rel-path production: "The first segment in a rel-path may not match the 'scheme' and ':' syntax". ... Test case: "http://example.123./aaa/bbb#ccc" I'd like to confirm that this is now regarded as a valid URI. In previous versions of this specification, it was not (according to my interpretation and implementation). I note that getting the parsing (look-ahead) logic for this just right has turned out to be a bit tricky. When parsing a 'qualified' production, if a leading '.' is not followed by a domainlabel, it must be re-interpreted as a trailing '.'. I think this production might be easier as a basis for implementation: qualified = [ "." [ domainlabel qualified ] ] (I can't see how to express this using repetition rather than recursion.) <aside> Derivation of above: qualified = *( "." domainlabel ) [ "." ] ...> [ 1*( "." domainlabel ) / "." / 1*( "." domainlabel ) "." ] ...> [ "." domainlabel *( "." domainlabel ) / "." / "." domainlabel *( "." domainlabel ) "." ] ...> [ "." [ domainlabel *( "." domainlabel ) / domainlabel *( "." domainlabel ) "." ] ] ...> [ "." [ domainlabel *( "." domainlabel ) [ "." ] ] ...> [ "." [ domainlabel qualified ] </aside> ... I note that getting the parsing (look-ahead) logic for the ipv6literal just right has been quite tricky, mainly when parsing phrases of the form (h4 ":"), it is important to ensure that the following character is not another ':'. #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
Received on Tuesday, 27 January 2004 16:30:08 UTC