RFC2396bis - implementation results using revised syntax from Graham Klyne on 2004-01-27 (uri@w3.org from January 2004)

From: Graham Klyne <gk@ninebynine.org>
Date: Tue, 27 Jan 2004 20:25:51 +0000
To: uri@w3.org
Message-Id: <5.1.0.14.2.20040127161216.00b85a28@127.0.0.1>

I've been re-implementing my URI parser based very closely on the latest 
RFC2396bis document.  Here are my comments.

The syntax has been taken from the very latest work-in-progress version of 
the document (CVS revision 1.64, from 
http://cvs.apache.org/viewcvs.cgi/ietf-uri/rev-2002/

Overall, I think it's looking pretty good.

...

Test case:  "http://example.org/aaa/bbb#ccc"
This parses as a valid relative-URI, because the syntax for hier-part -> 
rel-path -> segments allows the first segment to contain a ':' character.

Later:  I see this is covered by text in section 4.1.  I suggest also 
adding a note to the paragraph in section 3 following the rel-path 
production:  "The first segment in a rel-path may not match the 'scheme' 
and ':' syntax".

...

Test case:  "http://example.123./aaa/bbb#ccc"
I'd like to confirm that this is now regarded as a valid URI.  In previous 
versions of this specification, it was not (according to my interpretation 
and implementation).

I note that getting the parsing (look-ahead) logic for this just right has 
turned out to be a bit tricky.  When parsing a 'qualified' production, if a 
leading '.' is not followed by a domainlabel, it must be re-interpreted as 
a trailing '.'.

I think this production might be easier as a basis for implementation:
   qualified = [ "." [ domainlabel qualified ] ]

(I can't see how to express this using repetition rather than recursion.)

<aside>
Derivation of above:
   qualified = *( "." domainlabel ) [ "." ]
   ...>        [ 1*( "." domainlabel ) / "." / 1*( "." domainlabel ) "." ]
   ...>        [ "." domainlabel *( "." domainlabel )
               / "."
               / "." domainlabel *( "." domainlabel ) "." ]
   ...>        [ "." [ domainlabel *( "." domainlabel )
                     / domainlabel *( "." domainlabel ) "." ] ]
   ...>        [ "." [ domainlabel *( "." domainlabel ) [ "." ] ]
   ...>        [ "." [ domainlabel qualified ]
</aside>

...

I note that getting the parsing (look-ahead) logic for the ipv6literal just 
right has been quite tricky, mainly when parsing phrases of the form (h4 
":"), it is important to ensure that the following character is not another 
':'.

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Received on Tuesday, 27 January 2004 16:30:08 UTC