- From: Graham Klyne <GK@ninebynine.org>
- Date: Fri, 06 Feb 2004 10:46:37 +0000
- To: Stephen Pollei <stephen_pollei@comcast.net>, uri@w3.org
- Cc: www-rdf-validator@w3.org
DNS specifies a design for a very general distributed lookup system, not just host names, so it's not appropriate to look to that for the appropriate range of allowed characters. I understand the appropriate specifications are RFC 952, as modified by RFC 1123: [[ 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of "domain style names". (See RFC-921, "Domain Name System Implementation Schedule", for background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period. [...] ]] -- RFC 952 [[ 2.1 Host Names and Numbers The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax. Host software MUST handle host names of up to 63 characters and SHOULD handle host names of up to 255 characters. ]] -- RFC 1123 Accordingly, I think your proposal is inappropriate. I also believe the current RFC2396bis document is broadly consistent with the above-cited specifications. (There is a possible exception that I read RFC952/1123 to say that the *total* allowed length of a domain name must be at least 63 characters, so the per-domainlabel limit of 63 characters in the URI spec seems a little odd (though not strictly incorrect). (The I18N issues will appropriately be addressed by IRIs, another work-in-progress spec.) #g -- At 12:57 05/02/04 -0800, Stephen Pollei wrote: >On Thu, 2004-02-05 at 06:07, Martin Duerst wrote: > > At 18:34 04/02/04 -0800, Stephen Pollei wrote: > > >http://stephen_pollei.home.comcast.net/ gives Error: {W107} Bad URI > > > Host is not a well formed address! > > > > > >It's the underscore, however _'s are good host names according to rfc > > >2181 section 11 and rfc1123 sections 2.1 and 6.1.3.5 > > >The problem is that rfc2396 section 3.2.2 is unduly restrictive. > > If you think RFC 2396 is overly restrictive, please raise this point > > on the mailing list uri@w3.org, where the next version of this spec > > is discussed. >Hello, I've run into a situation where a uri that is handled properly by >most software I've run across has generated a warning in a RDF >validation tool. >I believe that the problem arose in the spec when the http1.0 spec >directly referenced an older more restrictive rfc concerning host names. >Later the http1.1 spec(RFC 2616 IIRC) passed the specification of what >constitutes a valid host name to RFC 2396. RFC 2396 still retains a more >restricted set of allowed characters, but didn't specify length >restricts like what the dns RFC's do. > >The DNS RFC's do specify that an application is allowed to specify a >subset of it's allowed names in it's own specs. So RFC 2396's >restrictions are valid restrictions in that sense. > >It does however restrict various things that would otherwise be OK. This >proposal doesn't fix international domain names in unicode. I however >think that RFC3492(punycode) and others is good enough for that purpose. > >I propose that the characters !$*+,=3D^_{|}~ be added as valid characters. >"&%'`()[]:;/\<>@?# should probably not be added as being valid. >" conflicts with quotation too much >& conflicts will sgml/xml entity too much >% is the escape char >'`();/\ might have way too much meaning elsewhere. >[]:/?# used for ipv6, port number separation, url component separation ><>@ is used too much in email addresses >control characters and whitespace characters should not be allowed.. >characters 127(ascii) and above should not be allowed. >Of course one could allow all the above and just have it be required >that they be escaped. That would be most liberal approach and might be >best. Hmmm... http://%2f%2e.org/ ;-> > >I also thing that the first character should be kept as being more >restrictive. Some DNS schemes are using '_' as first character for >special purposes for example. Has nice effect of also disallowing >http://www.**wow**.com/ . Too bad http://www.wow!!!.com/ would work! >Maybe disallow at beginning and at the end. Then >http://www.Jack+Jill.example.org/ could still work. >Anyway this is just top of my head comments. Feel free to rip it to >shreds. > >There should also maybe be a security note that dns and the character >encodings are more liberal. That with these allowed encodings some thing >like http://my${FOO}thing.example.org/ would be valid but might cause >trouble for shell scripts for example. That security problem already >existed though. ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
Received on Friday, 6 February 2004 09:32:54 UTC