- From: Stephen Pollei <stephen_pollei@comcast.net>
- Date: 05 Feb 2004 12:57:09 -0800
- To: uri@w3.org
- Cc: www-rdf-validator@w3.org, Martin Duerst <duerst@w3.org>
- Message-Id: <1076014630.970.106.camel@fury>
On Thu, 2004-02-05 at 06:07, Martin Duerst wrote: > At 18:34 04/02/04 -0800, Stephen Pollei wrote: > >http://stephen_pollei.home.comcast.net/ gives Error: {W107} Bad URI > > Host is not a well formed address! > > > >It's the underscore, however _'s are good host names according to rfc > >2181 section 11 and rfc1123 sections 2.1 and 6.1.3.5 > >The problem is that rfc2396 section 3.2.2 is unduly restrictive. > If you think RFC 2396 is overly restrictive, please raise this point > on the mailing list uri@w3.org, where the next version of this spec > is discussed. Hello, I've run into a situation where a uri that is handled properly by most software I've run across has generated a warning in a RDF validation tool. I believe that the problem arose in the spec when the http1.0 spec directly referenced an older more restrictive rfc concerning host names. Later the http1.1 spec(RFC 2616 IIRC) passed the specification of what constitutes a valid host name to RFC 2396. RFC 2396 still retains a more restricted set of allowed characters, but didn't specify length restricts like what the dns RFC's do. The DNS RFC's do specify that an application is allowed to specify a subset of it's allowed names in it's own specs. So RFC 2396's restrictions are valid restrictions in that sense. It does however restrict various things that would otherwise be OK. This proposal doesn't fix international domain names in unicode. I however think that RFC3492(punycode) and others is good enough for that purpose. I propose that the characters !$*+,=^_{|}~ be added as valid characters. "&%'`()[]:;/\<>@?# should probably not be added as being valid. " conflicts with quotation too much & conflicts will sgml/xml entity too much % is the escape char '`();/\ might have way too much meaning elsewhere. []:/?# used for ipv6, port number separation, url component separation <>@ is used too much in email addresses control characters and whitespace characters should not be allowed.. characters 127(ascii) and above should not be allowed. Of course one could allow all the above and just have it be required that they be escaped. That would be most liberal approach and might be best. Hmmm... http://%2f%2e.org/ ;-> I also thing that the first character should be kept as being more restrictive. Some DNS schemes are using '_' as first character for special purposes for example. Has nice effect of also disallowing http://www.**wow**.com/ . Too bad http://www.wow!!!.com/ would work! Maybe disallow at beginning and at the end. Then http://www.Jack+Jill.example.org/ could still work. Anyway this is just top of my head comments. Feel free to rip it to shreds. There should also maybe be a security note that dns and the character encodings are more liberal. That with these allowed encodings some thing like http://my${FOO}thing.example.org/ would be valid but might cause trouble for shell scripts for example. That security problem already existed though. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://slashdot.org/~joe_plastic/ http://stephen_pollei.home.comcast.net/ GPG Key fingerprint = EF6F 1486 EC27 B5E7 E6E1 3C01 910F 6BB5 4A7D 9677
Received on Thursday, 5 February 2004 16:01:48 UTC