W3C home > Mailing lists > Public > uri@w3.org > June 2003

Re: [uri] <none>

From: Mark Thomson <marktt@excite.com>
Date: Thu, 12 Jun 2003 08:43:26 -0400
Message-Id: <4.2.0.58.J.20030612084312.078cc188@localhost>
To: uri@w3.org








One other simpler modification to the grammar to address the problem of the 
path being undefined is:

net-path        = "//" authority net-path-suffix
net-path-suffix = ["/" path-segments]

path            = abs-path / rel-path / net-path-suffix

Of course, //ABCD+y in http://ABCD+y would still be parsed as a path.


  --- On Wed 06/11, Rob Cameron < cameron@cs.sfu.ca > wrote:
From: Rob Cameron [mailto: cameron@cs.sfu.ca]
To: fielding@apache.org, marktt@excite.com
      Cc: uri@w3.org
Date: Wed, 11 Jun 2003 12:31:42 -0700
Subject: Re: [uri] <none>

On June 9, 2003 04:18 pm, Roy T. Fielding wrote:<br>> > "A path is always 
defined for a URI, though the defined path may be<br>> > empty (zero 
length) or opaque (not containing any "/" delimiters)"<br>> ><br>> > The 
production for net-path says that abs-path is optional, so for a<br>> > URI 
like http://ABCD?query, we have both abs-path and rel-path<br>> > undefined 
and not empty and therefore path would be undefined. Do we<br>> > still 
have to assume that path is empty even when both abs-path and<br>> > 
rel-path are undefined ? or is the above statement from the draft<br>> > 
incorrect ?<br>><br>> Bummer.  The statement is correct, but I'll need to 
fix the ABNF so<br>> that it always ends up with a matching 
production.<br>><br>> Thanks for the report,<br>><br>> ....Roy<br><br>I've 
been playing with an experimental grammar modification that<br>addresses 
this problem and also addresses the following <br>additional 
wrinkle:   http://ABCD+y is a legal URI according to<br>the ABNF
  (as translated to regexps by abnf2re).<br><br>>>> 
parseURI('http://ABCD+y')<br>('http', None, '//ABCD+y', None, 
None)<br><br>That is, because ABCD+y is not  a legal authority, 
the<br>regular expression matching rules for http://ABCD+y backtrack<br>to 
accept //ABCD+y as a path.    <br><br>To address both the problem reported 
by Mark and the<br>problem above, I have found that there may be 
merit<br>to simplifying the URI production to directly reflect 
the<br>opening statement of section 3:<br><br>"The generic URI syntax 
consists of a hierarchical sequence of<br>components referred to as the 
scheme, authority, path, query, and<br>fragment."<br><br>URI  = scheme ":" 
["//" authority] path [ "?" query ] [ "#" fragment ]<br><br>This rule 
reflects the five-component structure and the statement<br>that a path 
always exists, even if it is empty.   It can be made<br>to work with either 
of the two following definitions of path:<br><br>path = abs-path / 
rel-path<br>path = segment *( "/"
  segment )<br><br>Running a parser based on either of these changes 
with<br>all the test cases listed in section 5.4 (both normal and 
<br>abnormal examples) gives precisely the same results as <br>with the 
grammar of bis-02 or bis-03.   (By the way, it<br>might be good to have 
some IPv6 literals in the test<br>examples.)<br><br>On the problem case of 
http://ABCD+y, the following results.<br>>>> 
parseURI('http://ABCD+y')<br>('http', 'ABCD', '+y', None, 
None)<br><br>Arguably, this is a better parse if http://ABCD+y is to be 
<br>accepted as a URI.   It is also a better parse if http://ABCD+y<br>is 
to be ruled out by the extra-grammatical restriction: "when<br>an authority 
exists, the path must either be empty or an<br>abs-path."  (Alternatively, 
"when an authority exists, the <br>first segment of the path must be 
empty.")   <br><br>Overall, I think the theme of grammar simplification 
reflected in the<br>change from bis02 to bis03 is a good idea.   One 
other<br>area that could us
  e some attention is the grammar of IPv6<br>literals.    <br><br><br><br>

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web! 
Received on Thursday, 12 June 2003 11:22:05 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:31 GMT