- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Tue, 2 Mar 2004 12:00:07 +0000
- To: uri@w3.org
The latest draft says:
An ABNF-driven parser will find that the border between
authority and path is ambiguous; they are disambiguated by the
"first-match-wins" (a.k.a. "greedy") algorithm. In other words,
if authority is present then the first segment of the path must be
empty.
The second sentence does not follow from the first. Consider this URI:
foo://joe@example.com:0x3FF/blah
According to the grammar, this can be parsed in either of two ways:
(1) authority =
path = //joe@example.com:0x3FF/blah
(2) authority = joe@example.com
path = :0x3FF/blah
It cannot be parsed this way:
(3) authority = joe@example.com:0x3FF
path = /blah
because non-digits are not allowed in the port.
The first-match-wins rule implies that the correct parsing is (2). Note
that the first path segment is not empty, but is ":0x3FF".
The regular expression in appendix B claims to break a well-formed URI
down into its components, but it gets this one wrong, yielding the
components in (3).
Perhaps the grammar should be tightened up so that this URI is invalid.
Note that the RFC-2396 grammar does not accept it.
If the grammar is kept as-is, the regular expression should be fixed to
parse this URI correctly, and the statement about the first path segment
being necessarily empty should be removed. That might have implications
for relative URI resolution...
In any case, it might be nice for the draft to provide a regular
expression that not only parses well-formed URIs, but also detects
ill-formed URIs (by failing to match them).
AMC
http://www.nicemice.net/amc/
Received on Tuesday, 2 March 2004 07:00:09 UTC