- From: Mike Brown <mike@skew.org>
- Date: Sun, 21 Dec 2003 14:18:41 -0700 (MST)
- To: uri@w3.org
Hi again, While doing some regression testing for URI reference syntax validation and parsing, I ran across something of note in the RFC 2396bis grammar. I'm working from http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html and am fully prepared to be publicly humiliated when someone points out that either this is not the current version or that I've overlooked something that makes the whole thing a non-issue. As far as I can tell, according to the grammar, the authority component can no longer be an empty string. So if you have a reference that begins with '//', optionally preceded by a scheme and ':', then everything from the '//' up to the next '?' or '#' or end of the string must be a rel-path, not a net-path. This situation causes problems when trying to interpret URI references such as '//', 'file:///', and 'myscheme://a:b/', not to mention Roy's own test cases for relative URI resolution where he uses base URIs like 'fred:///s//a/b/c'. The components of these references that used to be net-paths under RFC 2396 are now rel-paths under RFC 2396bis, unless I'm mistaken, which I probably am. If this new interpretation is correct, then it makes the regex in Appendix B misbehave, since it will assume that any path that starts with '//' must be a net-path. If this regex is used as the basis for the parser that implements the resolution algorithm in section 5 of RFC 2396bis, then you're going to have problems (as I currently have). If the grammar is correct, then I would suggest changing the first asterisk in the regex to a plus, but only after fully considering the dramatic effect this has on the test cases in http://www.ics.uci.edu/~fielding/url/test4.html and http://www.ics.uci.edu/~fielding/url/test5.html, if you use the regex to parse each URI into its components. -Mike
Received on Sunday, 21 December 2003 16:18:41 UTC