- From: Mark Thomson <marktt@excite.com>
- Date: Thu, 12 Jun 2003 08:43:26 -0400
- To: uri@w3.org
One other simpler modification to the grammar to address the problem of the path being undefined is: net-path = "//" authority net-path-suffix net-path-suffix = ["/" path-segments] path = abs-path / rel-path / net-path-suffix Of course, //ABCD+y in http://ABCD+y would still be parsed as a path. --- On Wed 06/11, Rob Cameron < cameron@cs.sfu.ca > wrote: From: Rob Cameron [mailto: cameron@cs.sfu.ca] To: fielding@apache.org, marktt@excite.com Cc: uri@w3.org Date: Wed, 11 Jun 2003 12:31:42 -0700 Subject: Re: [uri] <none> On June 9, 2003 04:18 pm, Roy T. Fielding wrote:<br>> > "A path is always defined for a URI, though the defined path may be<br>> > empty (zero length) or opaque (not containing any "/" delimiters)"<br>> ><br>> > The production for net-path says that abs-path is optional, so for a<br>> > URI like http://ABCD?query, we have both abs-path and rel-path<br>> > undefined and not empty and therefore path would be undefined. Do we<br>> > still have to assume that path is empty even when both abs-path and<br>> > rel-path are undefined ? or is the above statement from the draft<br>> > incorrect ?<br>><br>> Bummer. The statement is correct, but I'll need to fix the ABNF so<br>> that it always ends up with a matching production.<br>><br>> Thanks for the report,<br>><br>> ....Roy<br><br>I've been playing with an experimental grammar modification that<br>addresses this problem and also addresses the following <br>additional wrinkle: http://ABCD+y is a legal URI according to<br>the ABNF (as translated to regexps by abnf2re).<br><br>>>> parseURI('http://ABCD+y')<br>('http', None, '//ABCD+y', None, None)<br><br>That is, because ABCD+y is not a legal authority, the<br>regular expression matching rules for http://ABCD+y backtrack<br>to accept //ABCD+y as a path. <br><br>To address both the problem reported by Mark and the<br>problem above, I have found that there may be merit<br>to simplifying the URI production to directly reflect the<br>opening statement of section 3:<br><br>"The generic URI syntax consists of a hierarchical sequence of<br>components referred to as the scheme, authority, path, query, and<br>fragment."<br><br>URI = scheme ":" ["//" authority] path [ "?" query ] [ "#" fragment ]<br><br>This rule reflects the five-component structure and the statement<br>that a path always exists, even if it is empty. It can be made<br>to work with either of the two following definitions of path:<br><br>path = abs-path / rel-path<br>path = segment *( "/" segment )<br><br>Running a parser based on either of these changes with<br>all the test cases listed in section 5.4 (both normal and <br>abnormal examples) gives precisely the same results as <br>with the grammar of bis-02 or bis-03. (By the way, it<br>might be good to have some IPv6 literals in the test<br>examples.)<br><br>On the problem case of http://ABCD+y, the following results.<br>>>> parseURI('http://ABCD+y')<br>('http', 'ABCD', '+y', None, None)<br><br>Arguably, this is a better parse if http://ABCD+y is to be <br>accepted as a URI. It is also a better parse if http://ABCD+y<br>is to be ruled out by the extra-grammatical restriction: "when<br>an authority exists, the path must either be empty or an<br>abs-path." (Alternatively, "when an authority exists, the <br>first segment of the path must be empty.") <br><br>Overall, I think the theme of grammar simplification reflected in the<br>change from bis02 to bis03 is a good idea. One other<br>area that could us e some attention is the grammar of IPv6<br>literals. <br><br><br><br> _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
Received on Thursday, 12 June 2003 11:22:05 UTC