W3C home > Mailing lists > Public > www-rdf-validator@w3.org > February 2004

Re: uri handling of hosts is too restrictive

From: Graham Klyne <GK@ninebynine.org>
Date: Fri, 06 Feb 2004 10:46:37 +0000
Message-Id: <5.1.0.14.2.20040206102621.01f13ce8@127.0.0.1>
To: Stephen Pollei <stephen_pollei@comcast.net>, uri@w3.org
Cc: www-rdf-validator@w3.org

DNS specifies a design for a very general distributed lookup system, not 
just host names, so it's not appropriate to look to that for the 
appropriate range of allowed characters.

I understand the appropriate specifications are RFC 952, as modified by RFC 
1123:

[[
    1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
    to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
    sign (-), and period (.).  Note that periods are only allowed when
    they serve to delimit components of "domain style names". (See
    RFC-921, "Domain Name System Implementation Schedule", for
    background).  No blank or space characters are permitted as part of a
    name. No distinction is made between upper and lower case.  The first
    character must be an alpha character.  The last character must not be
    a minus sign or period.  [...]
]]
-- RFC 952

[[
    2.1  Host Names and Numbers

       The syntax of a legal Internet host name was specified in RFC-952
       [DNS:4].  One aspect of host name syntax is hereby changed: the
       restriction on the first character is relaxed to allow either a
       letter or a digit.  Host software MUST support this more liberal
       syntax.

       Host software MUST handle host names of up to 63 characters and
       SHOULD handle host names of up to 255 characters.
]]
-- RFC 1123

Accordingly, I think your proposal is inappropriate.  I also believe the 
current RFC2396bis document is broadly consistent with the above-cited 
specifications.

(There is a possible exception that I read RFC952/1123 to say that the 
*total* allowed length of a domain name must be at least 63 characters, so 
the per-domainlabel limit of 63 characters in the URI spec seems a little 
odd (though not strictly incorrect).

(The I18N issues will appropriately be addressed by IRIs, another 
work-in-progress spec.)

#g
--

At 12:57 05/02/04 -0800, Stephen Pollei wrote:
>On Thu, 2004-02-05 at 06:07, Martin Duerst wrote:
> > At 18:34 04/02/04 -0800, Stephen Pollei wrote:
> > >http://stephen_pollei.home.comcast.net/ gives Error: {W107} Bad URI
> > > Host is not a well formed address!
> > >
> > >It's the underscore, however _'s are good host names according to rfc
> > >2181 section 11 and rfc1123 sections 2.1 and 6.1.3.5
> > >The problem is that rfc2396 section 3.2.2 is unduly restrictive.
> > If you think RFC 2396 is overly restrictive, please raise this point
> > on the mailing list uri@w3.org, where the next version of this spec
> > is discussed.
>Hello, I've run into a situation where a uri that is handled properly by
>most software I've run across has generated a warning in a RDF
>validation tool.
>I believe that the problem arose in the spec when the http1.0 spec
>directly referenced an older more restrictive rfc concerning host names.
>Later the http1.1 spec(RFC 2616 IIRC) passed the specification of what
>constitutes a valid host name to RFC 2396. RFC 2396 still retains a more
>restricted set of allowed characters, but didn't specify length
>restricts like what the dns RFC's do.
>
>The DNS RFC's do specify that an application is allowed to specify a
>subset of it's allowed names in it's own specs. So RFC 2396's
>restrictions are valid restrictions in that sense.
>
>It does however restrict various things that would otherwise be OK. This
>proposal doesn't fix international domain names in unicode. I however
>think that RFC3492(punycode) and others is good enough for that purpose.
>
>I propose that the characters !$*+,=3D^_{|}~ be added as valid characters.
>"&%'`()[]:;/\<>@?# should probably not be added as being valid.
>" conflicts with quotation too much
>& conflicts will sgml/xml entity too much
>% is the escape char
>'`();/\ might have way too much meaning elsewhere.
>[]:/?# used for ipv6, port number separation, url component separation
><>@ is used too much in email addresses
>control characters and whitespace characters should not be allowed..
>characters 127(ascii) and above should not be allowed.
>Of course one could allow all the above and just have it be required
>that they be escaped. That would be most liberal approach and might be
>best. Hmmm... http://%2f%2e.org/ ;->
>
>I also thing that the first character should be kept as being more
>restrictive. Some DNS schemes are using '_' as first character for
>special purposes for example. Has nice effect of also disallowing
>http://www.**wow**.com/ . Too bad http://www.wow!!!.com/ would work!
>Maybe disallow at beginning and at the end. Then
>http://www.Jack+Jill.example.org/ could still work.
>Anyway this is just top of my head comments. Feel free to rip it to
>shreds.
>
>There should also maybe be a security note that dns and the character
>encodings are more liberal. That with these allowed encodings some thing
>like http://my${FOO}thing.example.org/ would be valid but might cause
>trouble for shell scripts for example. That security problem already
>existed though.

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Friday, 6 February 2004 09:32:54 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:16:59 UTC