- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Mon, 23 Jan 2006 18:30:53 +0100
- To: public-iri@w3.org
Bjoern Hoehrmann wrote: > http://bj%f6rn.example.org/ Ugh, indeed, <reg-name> doesn't limit itself to LDH, it can be any octet (some of them must be of course percent-encoded). And that's correct. Probably 2616 says why you can't do that for HTTP, but I found no ABNF for the 2616 <host>. Maybe it's in the prose. Another interesting case is http://what<ever.spammer.example One browser (guess) thinks that this is a good URL and happily supports to click on it. Decent anti-spam tools limited to play by the rules see http://what and ignore the URL. And 3986 doesn't _explicitly_ say that "<" and a few other VCHARs are never allowed in an URL. > http://example.org/~björn/ Obviously no URI. [http://bj%f6rn.example.org/] > The former is not allowed per RFC 3986 and RFC 3987 It doesn't clearly say so, apparently it all depends on the registry for the <reg-name>. And DNS labels can contain any octet, as some spammers found out. Of course the pointers to 1034 3.5 and 1123 2.1 would result in some style of LDH rule, and that kills the "%" in bj%f6rn But the LDH rules are also a bit vague today, some all-digit labels exist. RfC 3696 has it clear, at least the <toplabel> can't be all-digits, the worst case could be 1-2-3 (no ALPHA). OTOH 3696 is only informational and offers no ABNF. {http://example.org/~björn/] > the latter is not allowed per RFC 2396, RFC 2616, RFC > 3986, but allowed per ABNF and prose of RFC 3987 IIRC the 3987 ABNF is for the step when you have Unicode, your Latin-1 oumlaut won't match before you have u+00F6. But it's certainly okay here (in a Latin 1 text). > except that RFC 3987 requires in the prose to meet the > constraints in RFC 2616, e.g. > When stored or transmitted in digital representation, > bidirectional IRIs MUST be in full logical order and MUST > conform to the IRI syntax rules (which includes the rules > relevant to their scheme). No "abs-path =" in 2616, and I'm unwiling to try the "interpret 1738 for 3986" stunt now, appendix D.2 in 3986 is a royal PITA. Guessing: 3986 pchar is what I want, no oumlaut, no surprise. But we knew that, the oumlaut is obviously no URI, it's an IRI. If you translate it to an URI it would be ~bj%C3%B6rn, that's a legal segment of a path in an http-URL. > I agree that at the moment http://example.org/%C0 is not > illegal per any RFC though. This might be no nonsense at least for ftp: The ftp servers used to have legacy charsets. When I start ftpd I'd get some baroque pc-multilingual-850+euro. I could also start it in an windows-1252 session and let it create filenames where my file system later crashes, but I digress. Bye, Frank
Received on Monday, 23 January 2006 17:47:47 UTC