- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Tue, 2 Mar 2004 12:00:07 +0000
- To: uri@w3.org
The latest draft says: An ABNF-driven parser will find that the border between authority and path is ambiguous; they are disambiguated by the "first-match-wins" (a.k.a. "greedy") algorithm. In other words, if authority is present then the first segment of the path must be empty. The second sentence does not follow from the first. Consider this URI: foo://joe@example.com:0x3FF/blah According to the grammar, this can be parsed in either of two ways: (1) authority = path = //joe@example.com:0x3FF/blah (2) authority = joe@example.com path = :0x3FF/blah It cannot be parsed this way: (3) authority = joe@example.com:0x3FF path = /blah because non-digits are not allowed in the port. The first-match-wins rule implies that the correct parsing is (2). Note that the first path segment is not empty, but is ":0x3FF". The regular expression in appendix B claims to break a well-formed URI down into its components, but it gets this one wrong, yielding the components in (3). Perhaps the grammar should be tightened up so that this URI is invalid. Note that the RFC-2396 grammar does not accept it. If the grammar is kept as-is, the regular expression should be fixed to parse this URI correctly, and the statement about the first path segment being necessarily empty should be removed. That might have implications for relative URI resolution... In any case, it might be nice for the draft to provide a regular expression that not only parses well-formed URIs, but also detects ill-formed URIs (by failing to match them). AMC http://www.nicemice.net/amc/
Received on Tuesday, 2 March 2004 07:00:09 UTC