- From: Roy T. Fielding <fielding@apache.org>
- Date: Fri, 7 Mar 2003 15:55:02 -0800
- To: Graham Klyne <gk@ninebynine.org>
- Cc: <uri@w3.org>
> The problem is in the production for 'qualified'. To determine > whether an incoming ".abc" is a 'domainlabel' or a 'toplabel' requires > a significant lookahead, to the following '.' (if present) and the > character following that. To determine if an incoming ".123" is valid > can require an arbitrarily long lookahead (e.g. > http://0.123.4.5.6.7.8.9.10.11.12.13.x/). > > I think parsing precisely according to the syntax would be greatly > simplified if the syntax were relaxed so that: > > qualified = *( "." domainlabel ) [ "." ] > > i.e. drop the syntactic prohibition of URIs like this: > > http://www.example.123./foo > > I appreciate this is not strictly correct, but I see no practical harm > from defining the syntax in this way and asserting the form of the > final domain label as an extra-syntactic constraint. A (limited) few > tests with my browser suggest that it does not syntactically prohibit > numeric top-level domain labels, but simply reports that the domain > cannot be found. Doing that would cause the syntax to be ambiguous in regards to IPv4 addresses, which is why that syntax was added to the specification in the first place. The reason that literal IP addresses are explicitly denoted is because applications are encouraged to convert them directly to numeric IP rather than send everything to a DNS resolver. > If you really want to keep the syntactic constraint in place, I > suggest an alternative approach: > > hostname = qualified > qualified = numericlabel "." qualified / > toplabel [ "." [qualified] ] > > numericlabel = DIGIT [ 0*61( alphanum / "-" ) alphanum Well, that is harder for people to understand, but I agree that it is better for LALR parsers. > ... > > I think there's a typo in the syntax production for 'toplabel': > > s/alpha/ALPHA/ ? Yes, thanks for noting it. ....Roy
Received on Friday, 7 March 2003 19:10:11 UTC