- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 05 Mar 2003 17:41:38 +0000
- To: "Roy T. Fielding" <fielding@apache.org>
- Cc: <uri@w3.org>
Roy,
I have to say that the 'hostname' syntax as specified an RFC2396bis is a
pain to parse accurately. I think it's sufficiently difficult to get
exactly right that it won't be correctly implemented as specified in many
applications, which leaves me wondering if it really should be so fussily
correct with respect to domain name usage.
(The reason I'm noticing this is that I've been using the URI parsing task
to experiment with some programming tools and techniques that offer a more
direct correspondence between specification and the source code. If I were
doing this as part of a real application, I would long ago have ignored the
detailed syntax and done something very similar but much easier to implement.)
The problem is in the production for 'qualified'. To determine whether an
incoming ".abc" is a 'domainlabel' or a 'toplabel' requires a significant
lookahead, to the following '.' (if present) and the character following
that. To determine if an incoming ".123" is valid can require an
arbitrarily long lookahead (e.g. http://0.123.4.5.6.7.8.9.10.11.12.13.x/).
I think parsing precisely according to the syntax would be greatly
simplified if the syntax were relaxed so that:
qualified = *( "." domainlabel ) [ "." ]
i.e. drop the syntactic prohibition of URIs like this:
http://www.example.123./foo
I appreciate this is not strictly correct, but I see no practical harm from
defining the syntax in this way and asserting the form of the final domain
label as an extra-syntactic constraint. A (limited) few tests with my
browser suggest that it does not syntactically prohibit numeric top-level
domain labels, but simply reports that the domain cannot be found.
...
If you really want to keep the syntactic constraint in place, I suggest an
alternative approach:
hostname = qualified
qualified = numericlabel "." qualified /
toplabel [ "." [qualified] ]
numericlabel = DIGIT [ 0*61( alphanum / "-" ) alphanum
...
I think there's a typo in the syntax production for 'toplabel':
s/alpha/ALPHA/ ?
#g
-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Wednesday, 5 March 2003 18:05:15 UTC