- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 05 Mar 2003 17:41:38 +0000
- To: "Roy T. Fielding" <fielding@apache.org>
- Cc: <uri@w3.org>
Roy, I have to say that the 'hostname' syntax as specified an RFC2396bis is a pain to parse accurately. I think it's sufficiently difficult to get exactly right that it won't be correctly implemented as specified in many applications, which leaves me wondering if it really should be so fussily correct with respect to domain name usage. (The reason I'm noticing this is that I've been using the URI parsing task to experiment with some programming tools and techniques that offer a more direct correspondence between specification and the source code. If I were doing this as part of a real application, I would long ago have ignored the detailed syntax and done something very similar but much easier to implement.) The problem is in the production for 'qualified'. To determine whether an incoming ".abc" is a 'domainlabel' or a 'toplabel' requires a significant lookahead, to the following '.' (if present) and the character following that. To determine if an incoming ".123" is valid can require an arbitrarily long lookahead (e.g. http://0.123.4.5.6.7.8.9.10.11.12.13.x/). I think parsing precisely according to the syntax would be greatly simplified if the syntax were relaxed so that: qualified = *( "." domainlabel ) [ "." ] i.e. drop the syntactic prohibition of URIs like this: http://www.example.123./foo I appreciate this is not strictly correct, but I see no practical harm from defining the syntax in this way and asserting the form of the final domain label as an extra-syntactic constraint. A (limited) few tests with my browser suggest that it does not syntactically prohibit numeric top-level domain labels, but simply reports that the domain cannot be found. ... If you really want to keep the syntactic constraint in place, I suggest an alternative approach: hostname = qualified qualified = numericlabel "." qualified / toplabel [ "." [qualified] ] numericlabel = DIGIT [ 0*61( alphanum / "-" ) alphanum ... I think there's a typo in the syntax production for 'toplabel': s/alpha/ALPHA/ ? #g ------------------- Graham Klyne <GK@NineByNine.org> PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Wednesday, 5 March 2003 18:05:15 UTC